A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2011; you can also visit the original URL.
The file type is application/pdf
.
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor
2013
Journal of the ACM
Ye showed recently that the simplex method with Dantzig pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O mn 1−γ log n 1−γ iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye's analysis in two respects. First,
doi:10.1145/2432622.2432623
fatcat:ntouvh6v4jd25cn32bugrefwy4