Using inaccurate models in reinforcement learning

Pieter Abbeel, Morgan Quigley, Andrew Y. Ng
2006 Proceedings of the 23rd international conference on Machine learning - ICML '06  
In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for highdimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid
more » ... nt a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that-when given only a crude model and a small number of real-life trials-our algorithm can obtain near-optimal performance in the real system.
doi:10.1145/1143844.1143845 dblp:conf/icml/AbbeelQN06 fatcat:2vars47l3bam5nzu2kp2ejzxjy