Value Iteration for Long-Run Average Reward in Markov Decision Processes.

LANCASTER UNIVERSITY LEIPZIG

Text available via DOI:

https://doi.org/10.1007/978-3-319-63387-9_10
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Pranav Ashok
Krishnendu Chatterjee
Przemyslaw Daca
Jan Kretínský
Tobias Meggendorfer

More...

Publication date	13/07/2017
Host publication	CAV 2017: Computer Aided Verification
Editors	R. Majumdar, V. Kunčak
Place of Publication	Cham
Publisher	Springer
Pages	201-221
Number of pages	21
ISBN (electronic)	9783319633879
ISBN (print)	9783319633862
<mark>Original language</mark>	English

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer
Volume	10426
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.

Research

Links

Text available via DOI: