A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
PID Accelerated Value Iteration Algorithm
2021
International Conference on Machine Learning
The convergence rate of Value Iteration (VI), a fundamental procedure in dynamic programming and reinforcement learning, for solving MDPs can be slow when the discount factor is close to one. We propose modifications to VI in order to potentially accelerate its convergence behaviour. The key insight is the realization that the evolution of the value function approximations (V k ) k≥0 in the VI procedure can be seen as a dynamical system. This opens up the possibility of using techniques from
dblp:conf/icml/FarahmandG21
fatcat:3omjpn7pc5b5pcteswcm46jaqm