A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n^3m^2log^2 n) iterations if the discount factor is uniform and O(n^5m^3log^2 n) iterations if each action has a distinct discount factor. Previously the simplex method was known to run inarXiv:1208.5083v2 fatcat:qrvqltlxxbdg7jl63vw2sbkt6q