Filters








22,955 Hits in 4.4 sec

Imitation Learning from Imperfect Demonstration [article]

Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama
2019 arXiv   pre-print
Imitation learning (IL) aims to learn an optimal policy from demonstrations. However, such demonstrations are often imperfect since collecting optimal ones is costly.  ...  To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations.  ...  Imitation learning with confidence and unlabeled data In this section, we present two approaches to learning from imperfect demonstrations with confidence and unlabeled data.  ... 
arXiv:1901.09387v3 fatcat:3weigtcm5vcfldacwdzr7nkak4

Learning from Imperfect Demonstrations from Agents with Varying Dynamics [article]

Zhangjie Cao, Dorsa Sadigh
2021 arXiv   pre-print
Imitation learning enables robots to learn from demonstrations. Previous imitation learning algorithms usually assume access to optimal expert demonstrations.  ...  We therefore address the problem of imitation learning when the demonstrations can be sub-optimal or be drawn from agents with varying dynamics.  ...  learning from imperfect demonstration: 2IWIL and IC-GAIL [7] , and imitation learning from different dynamics: SAIL [28] .  ... 
arXiv:2103.05910v1 fatcat:lb7junzg4nb5tchnrrlipra6ky

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation [article]

Shani Gamrian, Yoav Goldberg
2019 arXiv   pre-print
The visual mapping from the target to the source domain is performed using unaligned GANs, resulting in a control policy that can be further improved using imitation learning from imperfect demonstrations  ...  Despite the remarkable success of Deep RL in learning control policies from raw pixels, the resulting models do not generalize.  ...  Acknowledgements We thank Hal Daum III for the helpful discussions on the Imitation Learning algorithm during the development of the work.  ... 
arXiv:1806.07377v6 fatcat:7a5lxvb57rahxbw36di4nccp5q

Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in Amateur Data in Imitation Learning [article]

Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
2021 arXiv   pre-print
In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm.  ...  However, demonstrations performed by human operators often contain noise or imperfect behaviors that can affect the efficiency of the imitator if left unchecked.  ...  robots that can naturally learn from demonstrations [5] , [6] .  ... 
arXiv:2108.00625v1 fatcat:xtrbsdmbzvcefjplrlwxwixie4

Learning from Imperfect Demonstrations via Adversarial Confidence Transfer [article]

Zhangjie Cao, Zihan Wang, Dorsa Sadigh
2022 arXiv   pre-print
We therefore study the problem of learning from imperfect demonstrations by learning a confidence predictor.  ...  Existing learning from demonstration algorithms usually assume access to expert demonstrations.  ...  Our contribution is a new confidence-based imitation learning algorithm that learns from imperfect demonstrations.  ... 
arXiv:2202.02967v2 fatcat:mokaojfknfexzewmpg5rapd7y4

Reinforcement Learning from Imperfect Demonstrations [article]

Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell
2019 arXiv   pre-print
Robust real-world learning should benefit from both demonstrations and interactions with the environment.  ...  Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received  ...  It combines an imitation hinge loss with the Q-learning loss in order to learn from demonstrations and transfer to environments smoothly.  ... 
arXiv:1802.05313v2 fatcat:ioizvvxuivd2ril5duzz62h5em

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance [article]

Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Chao Yang, Bin Fang, Huaping Liu
2019 arXiv   pre-print
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations.  ...  To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence  ...  It was also partially supported by the National Science Foundation of China (NSFC) and the German Research Foundation (DFG) in project Cross Modal Learning, NSFC 61621136008/DFG TRR-169.  ... 
arXiv:1911.07109v2 fatcat:ms5ms5c24bfrzas4v5m3binfl4

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Chao Yang, Bin Fang, Huaping Liu
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations.  ...  To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence  ...  It was also partially supported by the National Science Foundation of China (NSFC) and the German Research Foundation (DFG) in project Cross Modal Learning, NSFC 61621136008/DFG TRR-169.  ... 
doi:10.1609/aaai.v34i04.5953 fatcat:pkw7h5h7mfduni325g4ko6mheq

End-to-End Refinement Guided by Pre-trained Prototypical Classifier [article]

Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John Gregoire, Carla Gomes
2018 arXiv   pre-print
We propose imitation refinement, a novel approach to refine imperfect input patterns, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined  ...  The refiner learns to refine the imperfect patterns with small modifications, such that their embeddings are closer to the corresponding prototypes.  ...  We want to learn such knowledge from the ideally simulated data and further guide the refinement of the imperfect XRD patterns.  ... 
arXiv:1805.08698v2 fatcat:4o3zb2ttlrdczfy7mb2rhnxvom

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [article]

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou
2020 arXiv   pre-print
On the other hand, imitation learning (IL) learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.  ...  In this work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations for highly challenging sparse reward  ...  Unlike other imitation learning baselines whose performance are limited by the demonstrations, SAIL can rapidly surpass the imperfect teacher via constructing a better demonstration buffer and gradually  ... 
arXiv:2004.00530v1 fatcat:cak5zphczjdypetfhjiuuen25m

OIL: Observational Imitation Learning

Guohao Li, Matthias Mueller, Vincent Michael Casser, Neil Smith, Dominik Michels, Bernard Ghanem
2019 Robotics: Science and Systems XV  
To this end, we propose Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior by observing multiple imperfect  ...  Extensive experiments demonstrate that our trained network outperforms its teachers, conventional imitation learning (IL) and reinforcement learning (RL) baselines and even humans in simulation.  ...  In contrast to pure IL, OIL can prevent itself from learning bad demonstrations from imperfect teachers by observing teachers' behaviours and estimating the advantage or disadvantage to imitate them.  ... 
doi:10.15607/rss.2019.xv.005 dblp:conf/rss/LiMCSMG19 fatcat:pjlsjkgkbnhsdchlvvqhp45ohi

OIL: Observational Imitation Learning [article]

Guohao Li, Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem
2019 arXiv   pre-print
To this end, we propose Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior by observing multiple imperfect  ...  Extensive experiments demonstrate that our trained network outperforms its teachers, conventional imitation learning (IL) and reinforcement learning (RL) baselines and even humans in simulation.  ...  In contrast to pure IL, OIL can prevent itself from learning bad demonstrations from imperfect teachers by observing teachers' behaviours and estimating the advantage or disadvantage to imitate them.  ... 
arXiv:1803.01129v3 fatcat:6qf5lxsaxrhqpafblucynnhocu

Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [article]

Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov
2020 arXiv   pre-print
There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hierarchical methods and expert demonstrations.  ...  Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments.  ...  The use of demonstrations occurs once -during imitating phase in which the agent learns to imitate the demonstrator.  ... 
arXiv:2006.09939v1 fatcat:2ttjpnaevzgcvkva4nrbblphna

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning [article]

Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang
2021 arXiv   pre-print
To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.  ...  self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness.  ...  Learning a robust policy from imperfect demonstrations is difficult [Wu et al., 2019] .  ... 
arXiv:2112.04907v1 fatcat:sz4tps5w25a37mrdfjvcaaqe6y

Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft [article]

Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov
2020 arXiv   pre-print
HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories.  ...  We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data.  ...  Conclusion In this paper we introduce a novel approach to learn from imperfect demonstrations.  ... 
arXiv:1912.08664v4 fatcat:c7t67u2vxzgmjcqxmtwds4ve3i
« Previous Showing results 1 — 15 out of 22,955 results