A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
[article]
2019
arXiv
pre-print
We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated
arXiv:1912.05109v1
fatcat:plnttwxjrncz3fh25ig4h5q2a4