A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.arXiv:2001.10174v1 fatcat:gss3vsncgvd3tcx2nu4xj2hjqa