A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Improper Reinforcement Learning with Gradient-based Policy Optimization
[article]
2021
arXiv
pre-print
We consider an improper reinforcement learning setting where a learner is given M base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. We propose a gradient-based approach that
arXiv:2102.08201v3
fatcat:ycfwh2cdpfcafmryqai4uh33pu