A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Offline Policy Comparison under Limited Historical Agent-Environment Interactions
[article]
2021
arXiv
pre-print
We address the challenge of policy evaluation in real-world applications of reinforcement learning systems where the available historical data is limited due to ethical, practical, or security considerations. This constrained distribution of data samples often leads to biased policy evaluation estimates. To remedy this, we propose that instead of policy evaluation, one should perform policy comparison, i.e. to rank the policies of interest in terms of their value based on available historical
arXiv:2106.03934v1
fatcat:7qdbptejevcv5cbh4rwawgrfre