4 Hits in 3.9 sec

IQ-Learn: Inverse soft-Q Learning for Imitation [article]

Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, Stefano Ermon
2022 arXiv   pre-print
Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required  ...  We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy.  ...  Approach In this section, we develop our inverse soft-Q learning (IQ-Learn) algorithm, such that it recovers the optimal soft Q-function for an MDP from a given expert distribution.  ... 
arXiv:2106.12142v3 fatcat:uqvskmi6dzfwbe3zqepwrtqjmu

Target-absent Human Attention [article]

Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
2022 arXiv   pre-print
Our method integrates FFMs as the state representation in inverse reinforcement learning.  ...  We model visual search as an imitation learning problem and represent the internal knowledge that the viewer acquires through fixations using a novel state representation that we call Foveated Feature  ...  Based on soft Q-Learning [12] , IQ-Learn encodes both the reward and the policy in a single Q-function, and thus is able to optimize both reward and policy simultaneously.  ... 
arXiv:2207.01166v1 fatcat:acozyh4g3nfldeg3wk4tqmcpzi

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [article]

Marwa Abdulhai, Natasha Jaques, Sergey Levine
2022 arXiv   pre-print
This paper addresses the problem of inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.  ...  IRL can provide a generalizable and compact representation for apprenticeship learning, and enable accurately inferring the preferences of a human in order to assist them.  ...  Note that this formulation of Q-learning is equivalent to Soft Q-Learning Haarnoja et al. (2017) , which is a maximum entropy RL method that can improve exploration, and is thus a reasonable choice for  ... 
arXiv:2208.04919v1 fatcat:ronnvvfvbfdg3b2yfkjbybreji

Imitation Learning by State-Only Distribution Matching [article]

Damian Boborzi, Christoph-Nikolas Straehle, Jens S. Buchner, Lars Mikelsons
While many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator  ...  Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task.  ...  Inverse soft-Q learning (IQ-Learn) [Garg et al., 2021] avoids adversarial training by learning a single Q-function to implicitly representing both reward and policy.  ... 
doi:10.48550/arxiv.2202.04332 fatcat:w2nrhjcqt5ek3k7tu5rjn3xjia