PID Accelerated Value Iteration Algorithm

Amir Massoud Farahmand, Mohammad Ghavamzadeh
2021 International Conference on Machine Learning  
The convergence rate of Value Iteration (VI), a fundamental procedure in dynamic programming and reinforcement learning, for solving MDPs can be slow when the discount factor is close to one. We propose modifications to VI in order to potentially accelerate its convergence behaviour. The key insight is the realization that the evolution of the value function approximations (V k ) k≥0 in the VI procedure can be seen as a dynamical system. This opens up the possibility of using techniques from
more » ... trol theory to modify, and potentially accelerate, this dynamics. We present such modifications based on simple controllers, such as PD (Proportional-Derivative), PI (Proportional-Integral), and PID. We present the error dynamics of these variants of VI, and provably (for certain classes of MDPs) and empirically (for more general classes) show that the convergence rate can be significantly improved. We also propose a gain adaptation mechanism in order to automatically select the controller gains, and empirically show the effectiveness of this procedure.
dblp:conf/icml/FarahmandG21 fatcat:3omjpn7pc5b5pcteswcm46jaqm