A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL
[article]
2022
arXiv
pre-print
We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a policy from selecting unobserved actions, since the consequence of these actions cannot be presumed without additional information about the environment. One straightforward way to implement such a constraint is to explicitly model a given data distribution via behavior cloning and directly force a policy not to
arXiv:2206.00695v1
fatcat:mmmikxwwmzdn3gcaiqssadna5m