Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Omid Madani, Mikkel Thorup, Uri Zwick
2010 ACM Transactions on Algorithms  
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n 2 log n) time, where n is the number of vertices (or states) and m is the number of edges (or actions). The second one is an adaptation of a classical algorithm of Karp for finding minimum mean weight cycles. It runs in
more » ... n) time. The first algorithm has a slightly slower worst-case complexity, but is faster than the first algorithm in many situations. Both algorithms improve on a recent O(mn 2 )-time algorithm of Andersson and Vorobyov. We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving several previous algorithms.
doi:10.1145/1721837.1721849 fatcat:3p24vqhmirclfkl7ydjaxekaoy