A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
A New Framework for Query Efficient Active Imitation Learning
[article]
2019
arXiv
pre-print
We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with
arXiv:1912.13037v1
fatcat:gnucjz2ofjf7vl43tyurup7yci