A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
ADER:Adapting between Exploration and Robustness for Actor-Critic Methods
[article]
2021
arXiv
pre-print
Combining off-policy reinforcement learning methods with function approximators such as neural networks has been found to lead to overestimation of the value function and sub-optimal solutions. Improvement such as TD3 has been proposed to address this issue. However, we surprisingly find that its performance lags behind the vanilla actor-critic methods (such as DDPG) in some primitive environments. In this paper, we show that the failure of some cases can be attributed to insufficient
arXiv:2109.03443v1
fatcat:wwqqqgig7na6zpckv5drqj6efe