15,056 Hits in 5.0 sec

Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations [article]

Daniel S. Brown, Wonjoon Goo, Scott Niekum
2019 arXiv   pre-print
To address these issues, we first contribute a sufficient condition for better-than-demonstrator imitation learning and provide theoretical results showing why preferences over demonstrations can better  ...  While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically  ...  Problem Statement Our goal is to achieve better-than-demonstrator performance via imitation learning.  ... 
arXiv:1907.03976v3 fatcat:lldvi7dsjnhe3dxirzfdursdve

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [article]

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
2019 arXiv   pre-print
more than twice the performance of the best demonstration.  ...  In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in  ...  BCO and GAIL usually fail to perform better than the average demonstration performance because they explicitly seek to imitate the demonstrator rather than infer the demonstrator's intention.  ... 
arXiv:1904.06387v5 fatcat:rglnjfhb2zg5reugvmxgv4sofi

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences [article]

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum
2020 arXiv   pre-print
Bayesian REX also results in imitation learning performance that is competitive with or better than state-of-the-art methods that only learn point estimates of the reward function.  ...  Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning.  ...  Brown et al. (2019a) automatically generate preferences over demonstrations via noise injection, allowing better-than-demonstrator performance even in the absence of explicit preference labels.  ... 
arXiv:2002.09089v4 fatcat:vk6ebzm2ijesjdp3bahj5cjdgi

A Ranking Game for Imitation Learning [article]

Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum
2022 arXiv   pre-print
We propose a new framework for imitation learning - treating imitation as a two-player ranking-based Stackelberg game between a policy and a reward function.  ...  We use insights from this analysis to further increase sample efficiency of the ranking game by using automatically generated rankings or with offline annotated rankings.  ...  Learning purely from offline rankings in manipulation environments We compare with the ability of a prior method -TREX (Brown et al., 2019) that learns purely from suboptimal preferences.  ... 
arXiv:2202.03481v1 fatcat:5eue7bjio5baflnlxlbqfbn6km

Deep Bayesian Reward Learning from Preferences [article]

Daniel S. Brown, Scott Niekum
2019 arXiv   pre-print
We evaluate our proposed approach on the task of learning to play Atari games via imitation learning from pixel inputs, with no access to the game score.  ...  We demonstrate that B-REX learns imitation policies that are competitive with a state-of-the-art deep imitation learning method that only learns a point estimate of the reward function.  ...  The reward function is then used to optimize a potentially better-than-demonstrator policy.  ... 
arXiv:1912.04472v1 fatcat:c2ouhzmearhupopywchlvp7ckq

Learning to Weight Imperfect Demonstrations

Yunke Wang, Chang Xu, Bo Du, Honglak Lee
2021 International Conference on Machine Learning  
Theoretical analysis suggests that with the estimated weights the agent can learn a better policy beyond those plain expert demonstrations.  ...  This paper investigates how to weight imperfect expert demonstrations for generative adversarial imitation learning (GAIL). The agent is expected to perform behaviors demonstrated by experts.  ...  Considering this rank as auxiliary information that may not be provided in ordinary IL tasks, D-REX (Brown et al., 2020) is proposed to automatically get this rank.  ... 
dblp:conf/icml/WangXDL21 fatcat:bly2e6rpl5aolbivjjmvjuskby

Towards Uniformly Superhuman Autonomy via Subdominance Minimization

Brian D. Ziebart, Sanjiban Choudhury, Xinyan Yan, Paul Vernaza
2022 International Conference on Machine Learning  
This often prevents achieving expert-level or superhuman performance when identifying the better demonstrations to imitate is difficult.  ...  Prevalent imitation learning methods seek to produce behavior that matches or exceeds average human performance.  ...  Extensions that automatically learn to rank or provide significance weights assume that demonstration-based policy estimates have better rank than more random policies (Brown et al., 2020) , that demonstrations  ... 
dblp:conf/icml/ZiebartCYV22 fatcat:ucowjgrtdfc6zfg3kfcpwkc6x4

An Auto-tuning Framework for Autonomous Vehicles [article]

Haoyang Fan, Zhongpu Xia, Changchun Liu, Yaqin Chen, Qi Kong
2018 arXiv   pre-print
The framework includes a novel rank-based conditional inverse reinforcement learning algorithm, an offline training strategy and an automatic method of collecting and labeling data.  ...  Finally, the motion planner tuned by the framework is examined via both simulation and public road testing and is shown to achieve good performance.  ...  Related Work Typically, two major approaches are used to develop such a map: learning via demonstration (imitation learning) or through optimizing the current reward/cost functional. 1 Haoyang Fan and  ... 
arXiv:1808.04913v1 fatcat:dlbidbwuwjh4hdo4zgcqwsknvy

Injective State-Image Mapping facilitates Visual Adversarial Imitation Learning [article]

Subhajit Chaudhury, Daiki Kimura, Asim Munawar, Ryuki Tachibana
2019 arXiv   pre-print
Furthermore, we show that our method can learn action policies by imitating video demonstrations on YouTube with similar performance to learned agents from true reward signals.  ...  learning on video demonstrations is equivalent to learning from the state trajectories.  ...  Our method consistently performed better than the other video imitation methods and was comparable to GAIL's performance which was trained on both state and action trajectories.  ... 
arXiv:1810.01108v2 fatcat:tifxqy7b6zb75kv4fbwet4ffzm

Own and others' prior experiences influence children's imitation of causal acts

Rebecca A. Williamson, Andrew N. Meltzoff
2011 Cognitive development  
Our findings support the idea that young children's imitative learning can be regulated and selective and is not limited to blind, automatic, or compulsory copying.  ...  (M = 1.28, mean rank = 48.13) than did children assigned to the easy experience group (M = .68, mean rank = 32.88), Mann-Whitney U = 495.0, p = .002, r = .35).  ... 
doi:10.1016/j.cogdev.2011.04.002 pmid:21966091 pmcid:PMC3181112 fatcat:fsuc3ghumbgnhfeypeglfxz4du

Scaled Autonomy: Enabling Human Operators to Control Robot Fleets [article]

Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan
2020 arXiv   pre-print
We learn a model of the user's preferences from observations of the user's choices in easy settings with a few robots, and use it in challenging settings with more robots to automatically identify which  ...  We also run a hardware demonstration that illustrates how our method can be applied to a real-world mobile robot navigation task.  ...  One explanation for this result is that collecting expert action demonstrations in challenging states leads to a better imitation policy π R than demonstrations in less challenging states.  ... 
arXiv:1910.02910v2 fatcat:v6npkw7isbdllhn5x3n362nnna

DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning

Ruiping Li, Xiang Cheng
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
DIVINE guides the path-finding process, and learns reasoning policies and reward functions self-adaptively through imitating the demonstrations automatically sampled from KGs.  ...  In this paper, we present DIVINE, a novel plug-and-play framework based on generative adversarial imitation learning for enhancing existing RL-based methods.  ...  as DIVINE for "Deep Inference via Imitating Non-human Experts".  ... 
doi:10.18653/v1/d19-1266 dblp:conf/emnlp/LiC19 fatcat:6ptaa4qdc5fuhcr5s2adzifkke

Hyperparameter Selection for Imitation Learning [article]

Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem (+2 others)
2021 arXiv   pre-print
We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed  ...  The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting.  ...  The results suggest also that BC enjoys better HP transferability than AIL and PWIL which can be expected as it is a much simpler algorithm based on supervised learning.  ... 
arXiv:2105.12034v1 fatcat:xndutr66kjd5dpa7trwl5xenh4

A Study of Causal Confusion in Preference-Based Reward Learning [article]

Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel Brown
2022 arXiv   pre-print
Learning robot policies via preference-based reward learning is an increasingly popular method for customizing robot behavior.  ...  While there is much anecdotal, empirical, and theoretical analysis of causal confusion and reward gaming behaviors both in reinforcement learning and imitation learning approaches that directly map from  ...  on par with, if not better than, the approach suggested by Brown et al  ... 
arXiv:2204.06601v1 fatcat:wbjmjued4na2hoc53mqqtvs7zi

What's social about social learning?

Cecilia Heyes
2012 Journal of Comparative Psychology  
Even in the case of imitation, a type of social learning studied in both comparative psychology and cognitive science, there has been minimal contact between the two disciplines.  ...  Drawing on this evidence, I argue that social and asocial learning depend on the same basic learning mechanisms; these are adapted for the detection of predictive relationships in all natural domains;  ...  Automatic imitation effects of this kind have been found in more than 70 experiments, involving a range of action pairs (Heyes, in press) .  ... 
doi:10.1037/a0025180 pmid:21895355 fatcat:miri2bltbzh2rjte7rilcej5xq
« Previous Showing results 1 — 15 out of 15,056 results