A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
[article]
2020
arXiv
pre-print
feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. ...
Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. ...
We seek to remedy this problem by proposing and evaluating a method for safe and efficient Bayesian reward learning via preferences over demonstrations. ...
arXiv:2002.09089v4
fatcat:vk6ebzm2ijesjdp3bahj5cjdgi
Policy Gradient Bayesian Robust Optimization for Imitation Learning
[article]
2021
arXiv
pre-print
Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations ...
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. ...
AUTOLAB research is supported in part by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, the NSF National Robotics Initiative Award 1734633, and in part by donations from Google, Siemens ...
arXiv:2106.06499v2
fatcat:qpyrd4lrr5aqjl4kqeaf3z3fgi
Offline Preference-Based Apprenticeship Learning
[article]
2022
arXiv
pre-print
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline ...
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment ...
Unlike imitation learning, we do not assume that this dataset comes from a single expert attempting to optimize a specific reward function r(s, a). ...
arXiv:2107.09251v3
fatcat:szmstp3xpramppumkdxgqwwpcm
A Comprehensive Overview and Survey of Recent Advances in Meta-Learning
[article]
2020
arXiv
pre-print
We briefly introduce meta-learning methodologies in the following categories: black-box meta-learning, metric-based meta-learning, layered meta-learning and Bayesian meta-learning framework. ...
Meta-learning seeks adaptation of machine learning models to unseen tasks which are vastly different from trained tasks. ...
From probabilistic perspective, meta-learning can be formulated under Bayesian inference framework. ...
arXiv:2004.11149v7
fatcat:ko266mr26jar3pyn6t4r3l5drm
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
[article]
2019
arXiv
pre-print
order to infer high-quality reward functions from a set of potentially poor demonstrations. ...
In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in ...
To evaluate the quality of our learned reward, we then trained a policy to maximize the inferred reward function via PPO. ...
arXiv:1904.06387v5
fatcat:rglnjfhb2zg5reugvmxgv4sofi
Active Inference in Robotics and Artificial Agents: Survey and Challenges
[article]
2021
arXiv
pre-print
Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning. ...
Furthermore, we connect this approach with other frameworks and discuss its expected benefits and challenges: a unified framework with functional biological plausibility using variational Bayesian inference ...
From Bayesian Inference to the Free Energy Principle We will start by designing an agent that does not have access to the world/body state but has to infer it from the sensor measurements. ...
arXiv:2112.01871v1
fatcat:dux4iuejufb4bomn27eqv5rpea
AGI Safety Literature Review
[article]
2018
arXiv
pre-print
We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. ...
On a fundamental level, learning from actions and learning from preferences is not widely different. ...
(2017) use hierarchical Bayesian inference to infer a moral theory from actions, and Abel et al. (2016) suggest that POMDPs can be used as a formal framework for machine ethics. ...
arXiv:1805.01109v2
fatcat:v7vno74ngrcpljfj2exy5vf7rq
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning
[article]
2020
arXiv
pre-print
We propose a novel imitation-learning-based algorithm that distills a TS policy into an explicit policy representation by performing posterior inference and optimization offline. ...
Our algorithm iteratively performs offline batch updates to the TS policy and learns a new imitation policy. ...
The policy receives a small positive reward (+5) for eating a safe mushroom, a large negative reward (-35) for eating an unsafe mushroom, and zero reward for abstaining. ...
arXiv:2011.14266v2
fatcat:fspq3k7trffy5lan2p4pevlzyq
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
[article]
2020
arXiv
pre-print
Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an agent, given its policy or observed behavior. ...
We further discuss the extensions to traditional IRL methods for handling: inaccurate and incomplete perception, an incomplete model, multiple reward functions, and nonlinear reward functions. ...
The image is reprinted from [10] with permission from publisher. 1. Learning from an expert to create an agent with the expert's preferences. ...
arXiv:1806.06877v3
fatcat:nutrudn22vdwvbirayrcder5pi
Planning and Decision-Making for Autonomous Vehicles
2018
Annual Review of Control Robotics and Autonomous Systems
For instance, planning methods that provide safe and systemcompliant performance in complex, cluttered environments while modeling the uncertain interaction with other traffic participants are required ...
Furthermore, new paradigms, such as interactive planning and end-to-end learning, open up questions regarding safety and reliability that need to be addressed. ...
The proposed method learns the reward function via feature-based IRL from expert demonstrations. ...
doi:10.1146/annurev-control-060117-105157
fatcat:hgrhw76idbbdrct742bbhnsqem
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
2021
Machine Learning
In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. ...
the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. ...
IRL belongs to the broader class of Imitation Learning (IL, Osa et al. 2018 ) algorithms, whose high-level purpose is to "learn from demonstrations". ...
doi:10.1007/s10994-020-05939-8
fatcat:ec4jvh4bfng7bk7a57ydjcwfyi
Human-centric Dialog Training via Offline Reinforcement Learning
[article]
2020
arXiv
pre-print
How can we train a dialog model to produce better conversations by learning from human feedback, without the risk of humans teaching it harmful chat behaviors? ...
A well-known challenge is that learning an RL policy in an offline setting usually fails due to the lack of ability to explore and the tendency to make over-optimistic estimates of future reward. ...
Hosting real-time conversations online The trained models were deployed to interact live with human users via a web server that hosts neural network dialog models on GPU for fast, real-time inference: ...
arXiv:2010.05848v1
fatcat:fxelzo2gubahrfjvk34jdwthfi
Adversarial Evaluation of Autonomous Vehicles in Lane-Change Scenarios
[article]
2020
arXiv
pre-print
We then utilize a nonparametric Bayesian method to cluster the adversarial policies. ...
We also illustrate different patterns of generated adversarial environments, which can be used to infer the weaknesses of the tested vehicles. ...
To address this issue, we turn to Bayesian nonparametric models that automatically infer the model complexity from the data. ...
arXiv:2004.06531v2
fatcat:pe7qxglhjjej7dstyhusupcydu
Imitation Learning as f-Divergence Minimization
[article]
2020
arXiv
pre-print
We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them. ...
We propose a general imitation learning framework for estimating and minimizing any f-Divergence. ...
Imitation learning as f-divergence minimization Imitation learning is the process by which a learner tries to behave similarly to an expert based on inference from demonstrations or interactions. ...
arXiv:1905.12888v2
fatcat:ytygppppgzg5tl3ttjwyiwd5qi
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms
[article]
2020
arXiv
pre-print
We aim to survey a representative subset of that research which uses machine learning for manipulation. ...
Learning will be central to such autonomous systems, as the real world contains too much variation for a robot to expect to have an accurate model of its environment, the objects in it, or the skills required ...
The inferred reward function can then be optimized via reinforcement learning to learn a policy for the task. The IRL paradigm has several advantages. ...
arXiv:1907.03146v3
fatcat:2lmt6zpehfa3rj42a3k5kgd4ju
« Previous
Showing results 1 — 15 out of 585 results