585 Hits in 5.5 sec

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences [article]

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum
2020 arXiv   pre-print
feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference.  ...  Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning.  ...  We seek to remedy this problem by proposing and evaluating a method for safe and efficient Bayesian reward learning via preferences over demonstrations.  ... 
arXiv:2002.09089v4 fatcat:vk6ebzm2ijesjdp3bahj5cjdgi

Policy Gradient Bayesian Robust Optimization for Imitation Learning [article]

Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg
2021 arXiv   pre-print
Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations  ...  The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations.  ...  AUTOLAB research is supported in part by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, the NSF National Robotics Initiative Award 1734633, and in part by donations from Google, Siemens  ... 
arXiv:2106.06499v2 fatcat:qpyrd4lrr5aqjl4kqeaf3z3fgi

Offline Preference-Based Apprenticeship Learning [article]

Daniel Shin, Daniel S. Brown, Anca D. Dragan
2022 arXiv   pre-print
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline  ...  Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment  ...  Unlike imitation learning, we do not assume that this dataset comes from a single expert attempting to optimize a specific reward function r(s, a).  ... 
arXiv:2107.09251v3 fatcat:szmstp3xpramppumkdxgqwwpcm

A Comprehensive Overview and Survey of Recent Advances in Meta-Learning [article]

Huimin Peng
2020 arXiv   pre-print
We briefly introduce meta-learning methodologies in the following categories: black-box meta-learning, metric-based meta-learning, layered meta-learning and Bayesian meta-learning framework.  ...  Meta-learning seeks adaptation of machine learning models to unseen tasks which are vastly different from trained tasks.  ...  From probabilistic perspective, meta-learning can be formulated under Bayesian inference framework.  ... 
arXiv:2004.11149v7 fatcat:ko266mr26jar3pyn6t4r3l5drm

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [article]

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
2019 arXiv   pre-print
order to infer high-quality reward functions from a set of potentially poor demonstrations.  ...  In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in  ...  To evaluate the quality of our learned reward, we then trained a policy to maximize the inferred reward function via PPO.  ... 
arXiv:1904.06387v5 fatcat:rglnjfhb2zg5reugvmxgv4sofi

Active Inference in Robotics and Artificial Agents: Survey and Challenges [article]

Pablo Lanillos, Cristian Meo, Corrado Pezzato, Ajith Anil Meera, Mohamed Baioumy, Wataru Ohata, Alexander Tschantz, Beren Millidge, Martijn Wisse, Christopher L. Buckley, Jun Tani
2021 arXiv   pre-print
Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning.  ...  Furthermore, we connect this approach with other frameworks and discuss its expected benefits and challenges: a unified framework with functional biological plausibility using variational Bayesian inference  ...  From Bayesian Inference to the Free Energy Principle We will start by designing an agent that does not have access to the world/body state but has to infer it from the sensor measurements.  ... 
arXiv:2112.01871v1 fatcat:dux4iuejufb4bomn27eqv5rpea

AGI Safety Literature Review [article]

Tom Everitt, Gary Lea, Marcus Hutter
2018 arXiv   pre-print
We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation.  ...  On a fundamental level, learning from actions and learning from preferences is not widely different.  ...  (2017) use hierarchical Bayesian inference to infer a moral theory from actions, and Abel et al. (2016) suggest that POMDPs can be used as a formal framework for machine ethics.  ... 
arXiv:1805.01109v2 fatcat:v7vno74ngrcpljfj2exy5vf7rq

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning [article]

Hongseok Namkoong, Samuel Daulton, Eytan Bakshy
2020 arXiv   pre-print
We propose a novel imitation-learning-based algorithm that distills a TS policy into an explicit policy representation by performing posterior inference and optimization offline.  ...  Our algorithm iteratively performs offline batch updates to the TS policy and learns a new imitation policy.  ...  The policy receives a small positive reward (+5) for eating a safe mushroom, a large negative reward (-35) for eating an unsafe mushroom, and zero reward for abstaining.  ... 
arXiv:2011.14266v2 fatcat:fspq3k7trffy5lan2p4pevlzyq

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress [article]

Saurabh Arora, Prashant Doshi
2020 arXiv   pre-print
Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an agent, given its policy or observed behavior.  ...  We further discuss the extensions to traditional IRL methods for handling: inaccurate and incomplete perception, an incomplete model, multiple reward functions, and nonlinear reward functions.  ...  The image is reprinted from [10] with permission from publisher. 1. Learning from an expert to create an agent with the expert's preferences.  ... 
arXiv:1806.06877v3 fatcat:nutrudn22vdwvbirayrcder5pi

Planning and Decision-Making for Autonomous Vehicles

Wilko Schwarting, Javier Alonso-Mora, Daniela Rus
2018 Annual Review of Control Robotics and Autonomous Systems  
For instance, planning methods that provide safe and systemcompliant performance in complex, cluttered environments while modeling the uncertain interaction with other traffic participants are required  ...  Furthermore, new paradigms, such as interactive planning and end-to-end learning, open up questions regarding safety and reliability that need to be addressed.  ...  The proposed method learns the reward function via feature-based IRL from expert demonstrations.  ... 
doi:10.1146/annurev-control-060117-105157 fatcat:hgrhw76idbbdrct742bbhnsqem

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Amarildo Likmeta, Alberto Maria Metelli, Giorgia Ramponi, Andrea Tirinzoni, Matteo Giuliani, Marcello Restelli
2021 Machine Learning  
In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications.  ...  the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake.  ...  IRL belongs to the broader class of Imitation Learning (IL, Osa et al. 2018 ) algorithms, whose high-level purpose is to "learn from demonstrations".  ... 
doi:10.1007/s10994-020-05939-8 fatcat:ec4jvh4bfng7bk7a57ydjcwfyi

Human-centric Dialog Training via Offline Reinforcement Learning [article]

Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard
2020 arXiv   pre-print
How can we train a dialog model to produce better conversations by learning from human feedback, without the risk of humans teaching it harmful chat behaviors?  ...  A well-known challenge is that learning an RL policy in an offline setting usually fails due to the lack of ability to explore and the tendency to make over-optimistic estimates of future reward.  ...  Hosting real-time conversations online The trained models were deployed to interact live with human users via a web server that hosts neural network dialog models on GPU for fast, real-time inference:  ... 
arXiv:2010.05848v1 fatcat:fxelzo2gubahrfjvk34jdwthfi

Adversarial Evaluation of Autonomous Vehicles in Lane-Change Scenarios [article]

Baiming Chen, Xiang Chen, Wu Qiong, Liang Li
2020 arXiv   pre-print
We then utilize a nonparametric Bayesian method to cluster the adversarial policies.  ...  We also illustrate different patterns of generated adversarial environments, which can be used to infer the weaknesses of the tested vehicles.  ...  To address this issue, we turn to Bayesian nonparametric models that automatically infer the model complexity from the data.  ... 
arXiv:2004.06531v2 fatcat:pe7qxglhjjej7dstyhusupcydu

Imitation Learning as f-Divergence Minimization [article]

Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
2020 arXiv   pre-print
We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them.  ...  We propose a general imitation learning framework for estimating and minimizing any f-Divergence.  ...  Imitation learning as f-divergence minimization Imitation learning is the process by which a learner tries to behave similarly to an expert based on inference from demonstrations or interactions.  ... 
arXiv:1905.12888v2 fatcat:ytygppppgzg5tl3ttjwyiwd5qi

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms [article]

Oliver Kroemer, Scott Niekum, George Konidaris
2020 arXiv   pre-print
We aim to survey a representative subset of that research which uses machine learning for manipulation.  ...  Learning will be central to such autonomous systems, as the real world contains too much variation for a robot to expect to have an accurate model of its environment, the objects in it, or the skills required  ...  The inferred reward function can then be optimized via reinforcement learning to learn a policy for the task. The IRL paradigm has several advantages.  ... 
arXiv:1907.03146v3 fatcat:2lmt6zpehfa3rj42a3k5kgd4ju
« Previous Showing results 1 — 15 out of 585 results