A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Monotonic Robust Policy Optimization with Model Discrepancy
2021
International Conference on Machine Learning
State-of-the-art deep reinforcement learning (DRL) algorithms tend to overfit due to the model discrepancy between source and target environments. Though applying domain randomization during training can improve the average performance by randomly generating a sufficient diversity of environments in simulator, the worst-case environment is still neglected without any performance guarantee. Since the average and worstcase performance are both important for generalization in RL, in this paper, we
dblp:conf/icml/JiangLDZX21
fatcat:upveei6fjfacdhvfvyliovd2hy