A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
2021
Neural Computation
A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However,
doi:10.1162/neco_a_01387
pmid:34496391
fatcat:xgf75zm5o5blloexd4zjtoaiua