14 Hits in 3.9 sec

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning [article]

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson
2018 arXiv   pre-print
In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by  ...  The first problem is implicit bias present in the reward functions used in these algorithms.  ...  DISCRIMINATOR-ACTOR-CRITIC In this section we will present the Discriminator-Actor-Critic (DAC) algorithm.  ... 
arXiv:1809.02925v2 fatcat:bq47zx3xbzaclahyj4bp3imx2e

Addressing reward bias in Adversarial Imitation Learning with neutral reward functions [article]

Rohit Jena, Siddharth Agrawal, Katia Sycara
2020 arXiv   pre-print
Generative Adversarial Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm.  ...  We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states.  ...  Conclusion In this work, we address the problem of reward bias in adversarial imitation learning.  ... 
arXiv:2009.09467v1 fatcat:5wtbnsizmzeddgqskh7yxtnuvy

Off-Policy Adversarial Inverse Reinforcement Learning [article]

Samin Yeasar Arnob
2020 arXiv   pre-print
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL), which tries to imitate an expert without taking any reward from the environment and does not provide expert  ...  Adversarial Inverse Reinforcement Learning (AIRL) leverages the idea of AIL, integrates a reward function approximation along with learning the policy, and shows the utility of IRL in the transfer learning  ...  Implementation of Discriminator Actor-Critic algorithm used in this paper is initially implemented as ICLR-2019 Reproducibility Challenge along with Sheldon Benard and Vincent Luczkow [23] .  ... 
arXiv:2005.01138v1 fatcat:pc73o264g5hy3g5s5e3s7kuope

Support-weighted Adversarial Imitation Learning [article]

Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris
2020 arXiv   pre-print
reward bias.  ...  Adversarial Imitation Learning (AIL) is a broad family of imitation learning methods designed to mimic expert behaviors from demonstrations.  ...  We highlight that SAIL may be efficiently applied on top of many existing AIL algorithms such as GAIL and Discriminator-Actor-Critic Kostrikov et al. (2019) .  ... 
arXiv:2002.08803v1 fatcat:pnirzhs2xbaizohagtsi6nkedq

Generative Inverse Deep Reinforcement Learning for Online Recommendation [article]

Xiaocong Chen and Lina Yao and Aixin Sun and Xianzhi Wang and Xiwei Xu and Liming Zhu
2020 arXiv   pre-print
Deep reinforcement learning uses a reward function to learn user's interest and to control the learning process.  ...  To address the above issue, we propose a novel generative inverse reinforcement learning approach, namely InvRec, which extracts the reward function from user's behaviors automatically, for online recommendation  ...  However, there are remains a few shortcomings which are not addressed in this paper such as the sample inefficiency problem for the imitation learning (Kostrikov et al. 2019 ).  ... 
arXiv:2011.02248v1 fatcat:g7swi666d5bkbjhw6busphp5ey

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning [article]

Yannick Schroecker, Charles Isbell
2020 arXiv   pre-print
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in stochastic domains.  ...  As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.  ...  Adversarial Inverse Reinforcement Learning uses an adversarial objective to learn a fixed reward function (Fu et al., 2018) while Discriminator Actor-Critic (Kostrikov et al., 2019) reduces the number  ... 
arXiv:2002.06473v1 fatcat:6urnmvioenccnibkocmixtlqmu

Reparameterized Variational Divergence Minimization for Stable Imitation [article]

Dilip Arumugam, Debadeepta Dey, Alekh Agarwal, Asli Celikyilmaz, Elnaz Nouri, Bill Dolan
2020 arXiv   pre-print
While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories only  ...  We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising f-divergence minimization framework.  ...  Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In ICLR, 2018. Laskey, M., Staszak, S., Hsieh, W. Y.-S., Mahler, J., Pokorny, F.  ... 
arXiv:2006.10810v1 fatcat:zobyenu2rbfqfbjlk22yhus74u

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization [article]

Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai
2021 arXiv   pre-print
This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning.  ...  Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can  ...  Acknowledgments and Disclosure of Funding  ... 
arXiv:2006.13258v6 fatcat:eqqgv6thxresldwctjtt57h2bq

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning [article]

Trevor Ablett, Bryan Chan, Jonathan Kelly
2021 arXiv   pre-print
Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information  ...  Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method.  ...  Addressing sample inefficiency and reward bias in inverse reinforcement learning.  ... 
arXiv:2112.08932v1 fatcat:wvpu4nqlnjc2za5g2ff6uxsksq

A Survey on Reinforcement Learning for Recommender Systems [article]

Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, Chunyan Miao
2022 arXiv   pre-print
In particular, Reinforcement Learning (RL) based recommender systems have become an emerging research topic in recent years.  ...  Empirical results show that RL-based recommendation methods often surpass most of supervised learning methods, owing to the interactive nature and autonomous learning ability.  ...  Later it uses a discriminative actor-critic network to evaluate the learned policy, based on the reward function defined by R(s, a) = logD(s, a)−log max( , 1−logD(s, a)) +r, (11) where logD(s, a) is  ... 
arXiv:2109.10665v2 fatcat:wx5ghn66hzg7faxee54jf7gspq

What Matters for Adversarial Imitation Learning? [article]

Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz
2021 arXiv   pre-print
Adversarial imitation learning has become a popular framework for imitation in continuous control.  ...  To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both  ...  penalty in the off-policy setup in the following environments: HalfCheetah, Hopper, Walker2d and Ant, as well as InvertedPendulum which we did not use due to its simplicity.  ... 
arXiv:2106.00672v1 fatcat:ahs272nwxncgjb5zkkt6ut27fu

The MAGICAL Benchmark for Robust Imitation [article]

Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell
2020 arXiv   pre-print
Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations.  ...  This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially  ...  Acknowledgments and Disclosure of Funding We would like to thank reviewers for helping to improve the presentation of the paper (in particular, clarifying the distinction between traditional IL and robust  ... 
arXiv:2011.00401v1 fatcat:lqateel4dfbdhiu54ufzehaca4

Online Apprenticeship Learning [article]

Lior Shani, Tom Zahavy, Shie Mannor
2021 arXiv   pre-print
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.  ...  We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost.  ...  ., and Tompson, J. Discriminator-actor-critic: Addressing sam- ple inefficiency and reward bias in adversarial imitation learning. arXiv preprint arXiv:1809.02925, 2018.  ... 
arXiv:2102.06924v2 fatcat:vkszuwgds5aippquom6jqhn4s4

Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions [article]

Sinong Geng, Houssam Nassif, Carlos A. Manzanares, A. Max Reppen, Ronnie Sircar
2020 arXiv   pre-print
We name our method PQR, as it sequentially estimates the Policy, the Q-function, and the Reward function by deep learning.  ...  We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies.  ...  Acknowledgements The authors would like to thank Amazon Web Services for providing computational resources for the experiments in this paper.  ... 
arXiv:2007.07443v2 fatcat:pqkjwpbxqnbodg7g53hwml5q54