The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

2015
*
Mathematics of Operations Research
*

We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n 3 m 2 log 2 n) iterations if the discount factor is uniform and O(n 5 m 3 log 2 n) iterations if each action has a distinct discount factor. Previously the simplex method was known to

