A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
2020
Neural Information Processing Systems
Entropy regularization, smoothing of Q-values and neural network function approximator are key components of the state-of-the-art reinforcement learning (RL) algorithms, such as Soft Actor-Critic [1] . Despite the widespread use, the impact of these core techniques on the convergence of RL algorithms is not yet fully understood. In this work, we analyse these techniques from error propagation perspective using the approximate dynamic programming framework. In particular, our analysis shows that
dblp:conf/nips/SmirnovaD20
fatcat:s7dwxi2sqbepvj3quvvwvrxcpa