A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
2021
IEEE Access
The traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is proposed in this paper. Firstly, the value function is approximated by introducing the idea of two neural networks, and the minimum of the action value functions generated by the two networks is selected as
doi:10.1109/access.2021.3074535
fatcat:eyruw4ha6vh2ndxf5dm3cznrhq