A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Deep reinforcement learning has been investigated in high-dimensional continuous control tasks. Deep Deterministic Policy Gradients (DDPG) is known as a highly sample-efficient policy gradients algorithm. However, it is reported that DDPG is unstable during training due to bias and variance problems of learning its action-value function. In this paper, we propose Policy Gradients with Memory Augmented Critic (PGMAC) that builds action-value function with the memory module previously proposed asdoi:10.1527/tjsai.36-1_b-k71 fatcat:rg3vzrgzdfhg3p3zqnq5udllwa