Optimistic Value Iteration [chapter]

Arnd Hartmanns, Benjamin Lucien Kaminski
2020 Lecture Notes in Computer Science  
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration's ability to usually deliver good lower bounds: we
more » ... d lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.
doi:10.1007/978-3-030-53291-8_26 fatcat:fjoe3ibrgfdpxboytbv4cn5sse