A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
IQ-Learn: Inverse soft-Q Learning for Imitation
[article]
2022
arXiv
pre-print
Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required ...
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. ...
Approach In this section, we develop our inverse soft-Q learning (IQ-Learn) algorithm, such that it recovers the optimal soft Q-function for an MDP from a given expert distribution. ...
arXiv:2106.12142v3
fatcat:uqvskmi6dzfwbe3zqepwrtqjmu
Target-absent Human Attention
[article]
2022
arXiv
pre-print
Our method integrates FFMs as the state representation in inverse reinforcement learning. ...
We model visual search as an imitation learning problem and represent the internal knowledge that the viewer acquires through fixations using a novel state representation that we call Foveated Feature ...
Based on soft Q-Learning [12] , IQ-Learn encodes both the reward and the policy in a single Q-function, and thus is able to optimize both reward and policy simultaneously. ...
arXiv:2207.01166v1
fatcat:acozyh4g3nfldeg3wk4tqmcpzi
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
[article]
2022
arXiv
pre-print
This paper addresses the problem of inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. ...
IRL can provide a generalizable and compact representation for apprenticeship learning, and enable accurately inferring the preferences of a human in order to assist them. ...
Note that this formulation of Q-learning is equivalent to Soft Q-Learning Haarnoja et al. (2017) , which is a maximum entropy RL method that can improve exploration, and is thus a reasonable choice for ...
arXiv:2208.04919v1
fatcat:ronnvvfvbfdg3b2yfkjbybreji
Imitation Learning by State-Only Distribution Matching
[article]
2022
While many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator ...
Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. ...
Inverse soft-Q learning (IQ-Learn) [Garg et al., 2021] avoids adversarial training by learning a single Q-function to implicitly representing both reward and policy. ...
doi:10.48550/arxiv.2202.04332
fatcat:w2nrhjcqt5ek3k7tu5rjn3xjia