Anytime Guarantees for Reachability in Uncountable Markov Decision Processes.

LANCASTER UNIVERSITY LEIPZIG

Text available via DOI:

https://doi.org/10.4230/LIPIcs.CONCUR.2022.11
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Markov decision process, Uncountable system, anytime guarantee, discrete-time Markov control process, probabilistic verification

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Kush Grover
Jan Kretínský
Tobias Meggendorfer
Maximilian Weininger

More...

Publication date	6/09/2022
Host publication	33rd International Conference on Concurrency Theory, CONCUR 2022
Editors	Bartek Klin, Slawomir Lasota, Anca Muscholl
Publisher	Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Pages	11:1-11:20
Number of pages	20
ISBN (electronic)	9783959772464
<mark>Original language</mark>	English

Publication series

Name	Leibniz International Proceedings in Informatics, LIPIcs
Volume	243
ISSN (Print)	1868-8969

Abstract

We consider the problem of approximating the reachability probabilities in Markov decision processes (MDP) with uncountable (continuous) state and action spaces. While there are algorithms that, for special classes of such MDP, provide a sequence of approximations converging to the true value in the limit, our aim is to obtain an algorithm with guarantees on the precision of the approximation. As this problem is undecidable in general, assumptions on the MDP are necessary. Our main contribution is to identify sufficient assumptions that are as weak as possible, thus approaching the “boundary” of which systems can be correctly and reliably analyzed. To this end, we also argue why each of our assumptions is necessary for algorithms based on processing finitely many observations. We present two solution variants. The first one provides converging lower bounds under weaker assumptions than typical ones from previous works concerned with guarantees. The second one then utilizes stronger assumptions to additionally provide converging upper bounds. Altogether, we obtain an anytime algorithm, i.e. yielding a sequence of approximants with known and iteratively improving precision, converging to the true value in the limit. Besides, due to the generality of our assumptions, our algorithms are very general templates, readily allowing for various heuristics from literature in contrast to, e.g., a specific discretization algorithm. Our theoretical contribution thus paves the way for future practical improvements without sacrificing correctness guarantees.

Research

Links

Text available via DOI:

Keywords