Regularly Updated Deterministic Policy Gradient Algorithm [article]

Shuai Han and Wenbo Zhou and Shuai Lü and Jiayu Yu
<span title="2020-07-01">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. This paper theoretically proves that the learning procedure with RUD can make
more &raquo; ... use of new data in replay buffer than the traditional procedure. In addition, the low variance of the Q value in RUD is more suitable for the current Clipped Double Q-learning strategy. This paper has designed a comparison experiment against previous methods, an ablation experiment with the original DDPG, and other analytical experiments in Mujoco environments. The experimental results demonstrate the effectiveness and superiority of RUD.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="">arXiv:2007.00169v1</a> <a target="_blank" rel="external noopener" href="">fatcat:p4yiniujmbcclbkt6ujsohvwhq</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="" title=" access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> </button> </a>