61,085 Hits in 5.8 sec

On the Value of Interaction and Function Approximation in Imitation Learning

Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran
2021 Neural Information Processing Systems  
This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation.  ...  We study imitation learning under the µ-recoverability assumption of [27] which assumes that the difference in the Q-value under the expert policy across different actions in a state do not deviate beyond  ...  Linear function approximation in the no-interaction setting In this section, we go beyond the tabular setting and study IL in the presence of function approximation.  ... 
dblp:conf/nips/RajaramanHYLJR21 fatcat:o53cccrg4nhc5m5grfog2d6zui

SS-MAIL: Self-Supervised Multi-Agent Imitation Learning [article]

Akshay Dharmavaram, Tejus Gupta, Jiachen Li, Katia P. Sycara
2021 arXiv   pre-print
The current landscape of multi-agent expert imitation is broadly dominated by two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL).  ...  In this work, we address this issue by introducing a novel self-supervised loss that encourages the discriminator to approximate a richer reward function.  ...  shifts in the value-function.  ... 
arXiv:2110.08963v1 fatcat:3ydm4toxmnb73kjis5azijymae

Embodiment Adaptation from Interactive Trajectory Preferences

Michael Walton, Benjamin Migliori, John Reeder
2018 European Conference on Principles of Data Mining and Knowledge Discovery  
for value function approximation [6] and policy learning [5] .  ...  ) After each interaction, a pairwise preference is assigned between the two trajectories and an reward function approximation r is estimated using the method specified in [1] .  ... 
dblp:conf/pkdd/WaltonMR18 fatcat:ngpcphyb7rdbvcnchqt5ii4k7a

Dyna-AIL : Adversarial Imitation Learning by Planning [article]

Vaibhav Saxena, Srinivasan Sivanandan, Pulkit Mathur
2019 arXiv   pre-print
interactions in comparison to the state-of-the-art learning methods.  ...  Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence.  ...  function approximator for D and π, and use a set of expert trajectories to calculate the expectation w.r.t. π E .  ... 
arXiv:1903.03234v1 fatcat:oddi5xsxqzdabgnv5fbcfzf4r4

Improving imitated grasping motions through interactive expected deviation learning

Kathrin Grave, Jorg Stuckler, Sven Behnke
2010 2010 10th IEEE-RAS International Conference on Humanoid Robots  
Our method combines the advantages of reinforcement and imitation learning in a single coherent framework.  ...  One of the major obstacles that hinders the application of robots to human day-to-day tasks is the current lack of flexible learning methods to endow the robots with the necessary skills and to allow them  ...  The seamless integration of both learning types in our framework is in contrast to existing approaches that non-interactively chain imitation and reinforcement learning.  ... 
doi:10.1109/ichr.2010.5686846 dblp:conf/humanoids/GraveSB10 fatcat:leaztpyyf5dddd6vb7c4vb2cyu

Affordances, development and imitation

Luis Montesano, Manuel Lopes, Alexandre Bernardino, Jose Santos-Victor
2007 2007 IEEE 6th International Conference on Development and Learning  
The key concept is a general model for affordances able to learn the statistical relations between actions, object properties and the effects of actions on objects.  ...  To evaluate the approach, we provide results of affordance learning with a real robot and simple imitation games with people.  ...  ACKNOWLEDGMENTS This work was (partially) supported by the FCT Programa Operacional Sociedade de Informação (POSI) in the frame of QCA III, and by the EU Projects (IST-004370) RobotCub and (EU-FP6-NEST  ... 
doi:10.1109/devlrn.2007.4354054 fatcat:ttvc6kufffbyhcmskkmhfhwpgu

Multi-Agent Imitation Learning for Driving Simulation [article]

Raunak P. Bhattacharyya, Derek J. Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, Mykel J. Kochenderfer
2018 arXiv   pre-print
Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.  ...  Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models.  ...  ACKNOWLEDGMENTS Toyota Research Institute (TRI) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not TRI or any  ... 
arXiv:1803.01044v1 fatcat:c7x7bbxcejh7ddtxdfockkojcq

Learning Self-Imitating Diverse Policies [article]

Tanmay Gangwani, Qiang Liu, Jian Peng
2019 arXiv   pre-print
The success of popular algorithms for deep reinforcement learning, such as policy-gradients and Q-learning, relies heavily on the availability of an informative reward signal at each timestep of the sequential  ...  In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings.  ...  The derivation of the approximation and the underlying assumptions are in Appendix 5.1.  ... 
arXiv:1805.10309v2 fatcat:3wjitdpiffdxjmiibxvanafdqe

Experience, Imitation and Reflection; Confucius' Conjecture and Machine Learning [article]

Amir Ramezani Dooraki
2018 arXiv   pre-print
Regarding the learning methods of human, Confucius' point of view is that they are by experience, imitation and reflection.  ...  Having that in mind, and considering the several existing machine learning methods this question rises that 'What are some of the best ways for a machine to learn?'  ...  In order to tackle these problems a function approximator an be used in order to find the optimal values of each action or state.  ... 
arXiv:1808.00222v1 fatcat:pcugrxd5cfd53o4a4jjo5val2i

Multi-Agent Interactions Modeling with Correlated Policies [article]

Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu
2020 arXiv   pre-print
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies,  ...  Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods.  ...  The corresponding author Weinan Zhang is supported by NSFC (61702327, 61772333, 61632017) .  ... 
arXiv:2001.03415v3 fatcat:nu63toybuvhmhkrvbqd2edlikq

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [article]

Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
2021 arXiv   pre-print
We provide empirical evidence for the effectiveness of ΨΦ-learning as a method for improving RL, IRL, imitation, and few-shot transfer, and derive worst-case bounds for its performance in zero-shot transfer  ...  We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment.  ...  We thank Pablo Samuel Castro, Anna Harutyunyan, RAIL, OATML and IRIS lab members for their helpful feedback. We also thank the anonymous reviewers for useful comments during the review process.  ... 
arXiv:2102.12560v2 fatcat:iihuwyxiyvduvgc7em5hahxway

Memes in Artificial Life Simulations of Life History Evolution

John A. Bullinaria
2010 Workshop on the Synthesis and Simulation of Living Systems  
This paper extends the previous study by incorporating imitation and memes to provide a more complete account of learning as a factor in Life History Evolution.  ...  The effect that learning has on Life History Evolution has recently been studied using a series of Artificial Life simulations in which populations of competing individuals evolve to learn to perform well  ...  In many ways, the relevant trade-offs are clear from a theoretical point of view, but the interactions are complex and highly dependent on the associated parameters.  ... 
dblp:conf/alife/Bullinaria10 fatcat:hbvua4hlffcyrmmtnydq7kzagi

The Limits of Optimal Pricing in the Dark [article]

Quinlan Dawkins, Minbiao Han, Haifeng Xu
2021 arXiv   pre-print
A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past  ...  That is, before the pricing game starts, the buyer simply commits to "imitate" a different value function by pretending to always react optimally according to this imitative value function.  ...  In fact, the buyer could even just report his imitative value function to the buyer directly at the beginning of any interaction.  ... 
arXiv:2110.01707v1 fatcat:wkhe5o7iuvbhdkyeh3kaljtenm

Direct on-line imitation of human faces with hierarchical ART networks

Patrick Holthaus, Sven Wachsmuth
2013 2013 IEEE RO-MAN  
The marker-less method solely depends on the interactant's face as an input and does not use a set of basic emotions and is thus capable of displaying a large variety of facial expressions.  ...  This work-in-progress paper presents an on-line system for robotic heads capable of mimicking humans.  ...  We also greatly acknowledge the support of student assistant Marian Pohling in the technical realization of this work.  ... 
doi:10.1109/roman.2013.6628502 dblp:conf/ro-man/HolthausW13 fatcat:r2yth6hlura27o3b6prmo7qdki

Behavioral Cloning from Observation

Faraz Torabi, Garrett Warnell, Peter Stone
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several  ...  Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations.  ...  The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.  ... 
doi:10.24963/ijcai.2018/687 dblp:conf/ijcai/TorabiWS18 fatcat:ykal6qlt2jgehfsbpryh26zebe
« Previous Showing results 1 — 15 out of 61,085 results