A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences
[article]
2021
arXiv
pre-print
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. ...
In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on ...
Acknowledgments This work is supported by FLI grant RFP2-000, and NSF Awards #1849952 and #1941722. ...
arXiv:2006.14091v2
fatcat:5bvuyqpte5hifaluozij2jnmla
APReL: A Library for Active Preference-based Reward Learning Algorithms
[article]
2022
arXiv
pre-print
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants. ...
In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop ...
We use these preferences and rankings to learn the human's reward function. Demonstrations. Expert demonstrations of the optimal behavior may also be available to initialize the learning process. ...
arXiv:2108.07259v2
fatcat:lbvicxy2l5fxxitliuk5y2fh24
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
[article]
2019
arXiv
pre-print
Furthermore, we compare the similarity of human driven trajectories with optimal policies of our planner under learned and expert-tuned reward functions. ...
Manually tuning this reward function becomes a tedious task. In this paper, we propose an approach that relies on human driving demonstrations to automatically tune reward functions. ...
Difference between expected value of human driving demonstration and expected value of planner policies under learned reward functions. ...
arXiv:1905.00229v1
fatcat:ltcqbfrnyndu7hgirt3y7spm4q
How to talk so your robot will learn: Instructions, descriptions, and pragmatics
[article]
2022
arXiv
pre-print
To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function). ...
We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts spontaneous human behavior, and (2) our pragmatic listener is able to recover their reward functions ...
TRS is supported by the NDSEG Fellowship Program and RDH is supported by the NSF (grant #1911835). ...
arXiv:2206.07870v1
fatcat:nimwgsqg5zbufdj4ucyfcf5bsm
Active Preference-Based Gaussian Process Regression for Reward Learning
[article]
2020
arXiv
pre-print
One common approach is to learn reward functions from collected expert demonstrations. ...
Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences. ...
Katz, Amir Maleki and Juan Carlos Aragon for the discussions on alternative ways to ease feature design. We acknowledge funding by Allstate, FLI grant RFP2-000, and NSF grants #1941722 and #1849952. ...
arXiv:2005.02575v2
fatcat:q6wl6jrkmrahbna3ysipcfg3cm
Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation
2020
Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction
On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. ...
adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. ...
Therefore, strategy-only rewards learned by MSRD captures specific preferences within demonstrations. ...
doi:10.1145/3319502.3374791
dblp:conf/hri/ChenPGG20
fatcat:ghq3dmz5d5a6jobw2x2ucvc6ju
Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach
[article]
2020
arXiv
pre-print
A more practical approach is to replace human demonstrations by human queries, i.e., preference-based reinforcement learning. ...
continuous, high-dimensional reward function. ...
Integrating human demonstrations and preferences fits a recent trend of communicating complex personalized objectives to deep learning systems in, e.g., inverse reinforcement learning [6] , [19] , [ ...
arXiv:2010.07467v1
fatcat:kscg6oykwvdchidrlt4iv5vrqa
Simultaneous Learning of Objective Function and Policy from Interactive Teaching with Corrective Feedback
2019
2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)
Some imitation learning approaches rely on Inverse Reinforcement Learning (IRL) methods, to decode and generalize implicit goals given by expert demonstrations. ...
Experimental results show that the learned reward functions obtain similar performance in RL processes compared to engineered reward functions used as baseline, both in simulated and real environments. ...
In Algorithm 3, the steps that integrate the incremental process of learning policies and objective functions with vague human corrections is listed. ...
doi:10.1109/aim.2019.8868805
dblp:conf/aimech/CeleminK19
fatcat:x7dhf4rtaffefk4waqnpdlibxe
Reinforcement Learning With Human Advice: A Survey
2021
Frontiers in Robotics and AI
In this paper, we provide an overview of the existing methods for integrating human advice into a reinforcement learning process. ...
Finally, we review different approaches for integrating advice into the learning process. ...
ACKNOWLEDGMENTS This work was supported by the Romeo2 project. This manuscript has been released as a pre-print at arXiv (Najar and Chetouani, 2020) . ...
doi:10.3389/frobt.2021.584075
pmid:34141726
pmcid:PMC8205518
fatcat:fqipip7cp5hvlo22xsonqeqzcq
Reinforcement learning with human advice: a survey
[article]
2020
arXiv
pre-print
In this paper, we provide an overview of the existing methods for integrating human advice into a Reinforcement Learning process. ...
Finally, we review different approaches for integrating advice into the learning process. ...
Acknowledgments This work was supported by the Romeo2 project. ...
arXiv:2005.11016v2
fatcat:kvomaemvrzfq3lebfewnn4rdqq
Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks
2015
Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction - HRI '15
These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. ...
of this new user and will be robust to deviations of the human actions from prior demonstrations. ...
We posited that the reward function learned by the proposed framework for each cluster of demonstrated action sequences would accurately represent the goals of participants with that preference, and would ...
doi:10.1145/2696454.2696455
dblp:conf/hri/NikolaidisRGS15
fatcat:yuzba4bg3zhv3jn3yzm6mgh6ve
Towards Intrinsic Interactive Reinforcement Learning
[article]
2022
arXiv
pre-print
With the rising interest in human-in-the-loop (HITL) applications, RL algorithms have been adapted to account for human guidance giving rise to the sub-field of interactive reinforcement learning (IRL) ...
These two ideas have set RL and BCI on a collision course for one another through the integration of BCI into the IRL framework where intrinsic feedback can be utilized to help train an agent. ...
Millán, Brad Knox, and Peter Stone for taking the time to provide valuable feedback. ...
arXiv:2112.01575v2
fatcat:orpakgcvbrdh5ddhpmpbtxneau
Learning from Richer Human Guidance
2018
Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction - HRI '18
We focus on learning the desired objective function for a robot. Although trajectory demonstrations can be very informative of the desired objective, they can also be difficult for users to provide. ...
We focus on augmenting comparisons with feature queries, and introduce a unified formalism for treating all answers as observations about the true desired reward. ...
This is evidenced by them preferring the robot that optimizes the reward function learned through rich queries over the robot that optimizes the reward learned through comparison-only queries. ...
doi:10.1145/3171221.3171284
dblp:conf/hri/BasuSD18
fatcat:afot7cclebgsvfnccewmxtthaq
Comparative Analysis of Human Movement Prediction: Space Syntax and Inverse Reinforcement Learning
[article]
2018
arXiv
pre-print
In this paper, comparative analysis of space syntax metrics and maximum entropy inverse reinforcement learning (MEIRL) is performed. ...
Space syntax matrix has been the main approach for human movement prediction in the urban environment. ...
The acquired reward function explains the intrinsic preference of policy demonstrator, or expert agents. ...
arXiv:1801.00464v2
fatcat:w4nuizhdabatndwf2lmc5qcoqi
Co-active Learning to Adapt Humanoid Movement for Manipulation
[article]
2016
arXiv
pre-print
The framework also considers user's feedback towards the adapted trajectories, and it learns to adapt movement through human-in-the-loop interactions. ...
It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints. ...
Although this reward function can be recovered from demonstrations by Inverse Optimal Control, as [13] suggests, it assumes that demonstrations are from experts, which bears an oracle reward function ...
arXiv:1609.03628v1
fatcat:2dulshpiuvhinn7lhi4ibnw26q
« Previous
Showing results 1 — 15 out of 121,286 results