A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach
[article]
2020
arXiv
pre-print
In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data. Deep policy optimization methods have achieved impressive results on complex tasks, but their real-world adoption remains limited because they often require significant amounts of data to succeed. When combined with small sample sizes, these methods can result in unstable learning due to their reliance on high-dimensional
arXiv:2012.10791v1
fatcat:euqqdcoi4rea3i52ww7p3poyte