201,714 Hits in 2.5 sec

Self-Imitation Learning [article]

Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee
2018 arXiv   pre-print
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions.  ...  Self-Imitation Learning The goal of self-imitation learning (SIL) is to imitate the agent's past good experiences in the actor-critic framework.  ...  Thus, the agent can benefit more from self-imitation learning because self-imitation learning captures such rare experiences and learn from them.  ... 
arXiv:1806.05635v1 fatcat:a5i3fgkt2fajnl53de4erutjwy

Self-Imitation Advantage Learning [article]

Johan Ferret, Olivier Pietquin, Matthieu Geist
2020 arXiv   pre-print
Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems.  ...  We propose SAIL, a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator that we connect to Advantage Learning.  ...  RELATED WORK Extending self-imitation learning. Guo et al.  ... 
arXiv:2012.11989v1 fatcat:u3h7ugxgzzf6zapvmzivfabg5u

Self-Imitation Learning via Generalized Lower Bound Q-learning [article]

Yunhao Tang
2021 arXiv   pre-print
Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning.  ...  In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms.  ...  based self-imitation learning [8] .  ... 
arXiv:2006.07442v3 fatcat:lwuisb33u5ca5cx6f4w3vepwf4

Self-Imitation Learning by Planning [article]

Sha Luo, Hamidreza Kasaei, Lambert Schomaker
2021 arXiv   pre-print
In this work, we solve this problem using our proposed approach called self-imitation learning by planning (SILP), where demonstration data are collected automatically by planning on the visited states  ...  Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration.  ...  As we plan demonstrations online automatically by utilizing the self-generated states from the current policy, we name our approach as self-imitation learning by planning (SILP).  ... 
arXiv:2103.13834v2 fatcat:qofm3ggeljgfdn3exrydwajqli

Generative Adversarial Self-Imitation Learning [article]

Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee
2018 arXiv   pre-print
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via  ...  generative adversarial imitation learning framework.  ...  Generative Adversarial Self-Imitation Learning The main idea of Generative Adversarial Self-Imitation Learning (GASIL) is to update the policy to imitate past good trajectories using GAIL framework (see  ... 
arXiv:1812.00950v1 fatcat:tataf5lntng5fhpp34btya6kbe

Learning Self-Imitating Diverse Policies [article]

Tanmay Gangwani, Qiang Liu, Jian Peng
2019 arXiv   pre-print
In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings.  ...  We then discuss limitations of self-imitation learning, and propose to solve them by using Stein variational policy gradient descent with the Jensen-Shannon kernel to learn multiple diverse policies.  ...  This algorithm can be seen as self-imitation learning, in which the expert trajectories in the experience replays are self-generated by the agent during the course of learning, rather than using some external  ... 
arXiv:1805.10309v2 fatcat:3wjitdpiffdxjmiibxvanafdqe

Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [article]

Zhixin Chen, Mengxiang Lin
2021 arXiv   pre-print
In this paper, we propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR).  ...  The application of reinforcement learning (RL) in robotic control is still limited in the environments with sparse and delayed rewards.  ...  Fig. 1 : 1 Our self-imitation learning framework for robot learning.  ... 
arXiv:2010.06962v3 fatcat:bjdjkj7o7jcotoj7nzexrw73ku

Using Self-Imitation to Direct Learning

Joe Saunders, Chrystopher Nehaniv, Kerstin Dautenhahn
2006 ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication  
Self-imitation is where an agent is able to learn and replicate actions it has experienced through the manipulation of its body by another.  ...  An evolutionary predecessor to observational imitation may have been self-imitation.  ...  They support a form of self-imitation that may be the natural precursor to more complex forms of imitative learning. In our framework we use the idea of putting through directly.  ... 
doi:10.1109/roman.2006.314425 dblp:conf/ro-man/SaundersND06 fatcat:tpnywxoulfaxxfzcb4vqtcyobu

Episodic Self-Imitation Learning with Hindsight

Tianhong Dai, Hengyan Liu, Anil Anthony Bharath
2020 Electronics  
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning.  ...  Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation  ...  Figure 1 . 1 Illustration of difference between self-imitation learning (SIL)+hindsight experience replay (HER) and episodic self-imitation learning (ESIL).  ... 
doi:10.3390/electronics9101742 fatcat:rxpcgcsn2zcstfugpzjdzqcsre

Self-Imitation Learning of Locomotion Movements through Termination Curriculum [article]

Amin Babadi, Kourosh Naderi, Perttu Hämäläinen
2019 arXiv   pre-print
In this paper, we propose and evaluate a novel combination of techniques for accelerating the learning of stable locomotion movements through self-imitation learning of synthetic animations.  ...  This allows us to use reinforcement learning with Reference State Initialization (RSI) to find a neural network controller for imitating the synthesized reference motion.  ...  In this paper, we propose a self-imitation learning approach for enabling rapid learning of stable locomotion controllers.  ... 
arXiv:1907.11842v2 fatcat:fe6yxsd3svbepd5ctka66jbaza

Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning [article]

Jinghuan Shang, Michael S. Ryoo
2021 arXiv   pre-print
Humans learn to imitate by observing others. However, robot imitation learning generally requires expert demonstrations in the first-person view (FPV).  ...  Third-person imitation learning (TPIL) is the concept of learning action policies by observing other agents in a third-person view (TPV), similar to what humans do.  ...  TCN [5] uses a time-contrastive way to learn representations by self-supervised metric learning.  ... 
arXiv:2108.01069v1 fatcat:w4kswzgqiffa3ifwn34src4s24

Multimodal imitation using self-learned sensorimotor representations

Martina Zambelli, Yiannis Demiris
2016 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
self-learned multimodal sensorimotor relations, without the need of solving inverse kinematic problems or explicit analytical models formulation.  ...  We evaluate the proposed method on a humanoid iCub robot learning to interact with a piano keyboard and imitating a human demonstration.  ...  Multimodal information is then crucial to improve skills and learned self-representations. Imitation learning methods have been shown effective in enhancing complex robots skills [1, 2] .  ... 
doi:10.1109/iros.2016.7759582 dblp:conf/iros/ZambelliD16 fatcat:3pjp76nvcvd2lmqxa2w3ntb5pa

Self-Practice Imitation Learning from Weak Policy [chapter]

Qing Da, Yang Yu, Zhi-Hua Zhou
2013 Lecture Notes in Computer Science  
Imitation learning is an effective strategy to reinforcement learning, which avoids the delayed reward problem by learning from mentor-demonstrated trajectories.  ...  A limitation for imitation learning is that collecting sufficient qualified demonstrations is quite expensive.  ...  The LEWE Framework We propose the LEarning from WEak policy (LEWE) framework that outlines the self-improve procedure for an agent, as shown in Algorithm 1.  ... 
doi:10.1007/978-3-642-40705-5_2 fatcat:rgsn57qubva2dctz5vcbs4764y

Learning intuitive physics and one-shot imitation using state-action-prediction self-organizing maps [article]

Martin Stetter, Elmar W. Lang
2021 arXiv   pre-print
Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks.  ...  Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures.  ...  There are similar approaches which address intuitive physics learning on the basis of self-organizing maps.  ... 
arXiv:2007.01647v3 fatcat:wrv62evc25b4rbzlgu7b6j2zjm

Common Sensorimotor Representation for Self-initiated Imitation Learning [chapter]

Yasser Mohammad, Yoshimasa Ohmoto, Toyoaki Nishida
2012 Lecture Notes in Computer Science  
This paper reports on a series of experiments comparing these two alternatives for self-initiated imitation tasks.  ...  Internal representation is an important design decision in any imitation learning system. Actions and perceptual spaces were separate in classical AI due to the standard sense-process-act loop.  ...  Practically, what is most important in self-initiated imitation is having high accuracy in marking the boundaries of behaviors to be learned.  ... 
doi:10.1007/978-3-642-31087-4_40 fatcat:md6bokzh3rcgvenzbwo5oz5qyi
« Previous Showing results 1 — 15 out of 201,714 results