Model-Free IRL Using Maximum Likelihood Estimation

Vinamra Jain, Prashant Doshi, Bikramjit Banerjee
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
The problem of learning an expert's unknown reward function using a limited number of demonstrations recorded from the expert's behavior is investigated in the area of inverse reinforcement learning (IRL). To gain traction in this challenging and underconstrained problem, IRL methods predominantly represent the reward function of the expert as a linear combination of known features. Most of the existing IRL algorithms either assume the availability of a transition function or provide a complex
more » ... nd inefficient approach to learn it. In this paper, we present a model-free approach to IRL, which casts IRL in the maximum likelihood framework. We present modifications of the model-free Q-learning that replace its maximization to allow computing the gradient of the Q-function. We use gradient ascent to update the feature weights to maximize the likelihood of expert's trajectories. We demonstrate on two problem domains that our approach improves the likelihood compared to previous methods.
doi:10.1609/aaai.v33i01.33013951 fatcat:dmn4bgogsrbddhtoxubzwx3zju