121,286 Hits in 8.3 sec

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences [article]

Erdem Bıyık, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
2021 arXiv   pre-print
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers.  ...  In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on  ...  Acknowledgments This work is supported by FLI grant RFP2-000, and NSF Awards #1849952 and #1941722.  ... 
arXiv:2006.14091v2 fatcat:5bvuyqpte5hifaluozij2jnmla

APReL: A Library for Active Preference-based Reward Learning Algorithms [article]

Erdem Bıyık, Aditi Talati, Dorsa Sadigh
2022 arXiv   pre-print
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants.  ...  In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop  ...  We use these preferences and rankings to learn the human's reward function. Demonstrations. Expert demonstrations of the optimal behavior may also be available to initialize the learning process.  ... 
arXiv:2108.07259v2 fatcat:lbvicxy2l5fxxitliuk5y2fh24

Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving [article]

Sascha Rosbach, Vinit James, Simon Großjohann, Silviu Homoceanu, Stefan Roth
2019 arXiv   pre-print
Furthermore, we compare the similarity of human driven trajectories with optimal policies of our planner under learned and expert-tuned reward functions.  ...  Manually tuning this reward function becomes a tedious task. In this paper, we propose an approach that relies on human driving demonstrations to automatically tune reward functions.  ...  Difference between expected value of human driving demonstration and expected value of planner policies under learned reward functions.  ... 
arXiv:1905.00229v1 fatcat:ltcqbfrnyndu7hgirt3y7spm4q

How to talk so your robot will learn: Instructions, descriptions, and pragmatics [article]

Theodore R Sumers, Robert D Hawkins, Mark K Ho, Thomas L Griffiths, Dylan Hadfield-Menell
2022 arXiv   pre-print
To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function).  ...  We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts spontaneous human behavior, and (2) our pragmatic listener is able to recover their reward functions  ...  TRS is supported by the NDSEG Fellowship Program and RDH is supported by the NSF (grant #1911835).  ... 
arXiv:2206.07870v1 fatcat:nimwgsqg5zbufdj4ucyfcf5bsm

Active Preference-Based Gaussian Process Regression for Reward Learning [article]

Erdem Bıyık, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
2020 arXiv   pre-print
One common approach is to learn reward functions from collected expert demonstrations.  ...  Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences.  ...  Katz, Amir Maleki and Juan Carlos Aragon for the discussions on alternative ways to ease feature design. We acknowledge funding by Allstate, FLI grant RFP2-000, and NSF grants #1941722 and #1849952.  ... 
arXiv:2005.02575v2 fatcat:q6wl6jrkmrahbna3ysipcfg3cm

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Letian Chen, Rohan Paleja, Muyleng Ghuy, Matthew Gombolay
2020 Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction  
On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations.  ...  adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward.  ...  Therefore, strategy-only rewards learned by MSRD captures specific preferences within demonstrations.  ... 
doi:10.1145/3319502.3374791 dblp:conf/hri/ChenPGG20 fatcat:ghq3dmz5d5a6jobw2x2ucvc6ju

Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach [article]

Huixin Zhan, Feng Tao, Yongcan Cao
2020 arXiv   pre-print
A more practical approach is to replace human demonstrations by human queries, i.e., preference-based reinforcement learning.  ...  continuous, high-dimensional reward function.  ...  Integrating human demonstrations and preferences fits a recent trend of communicating complex personalized objectives to deep learning systems in, e.g., inverse reinforcement learning [6] , [19] , [  ... 
arXiv:2010.07467v1 fatcat:kscg6oykwvdchidrlt4iv5vrqa

Simultaneous Learning of Objective Function and Policy from Interactive Teaching with Corrective Feedback

Carlos Celemin, Jens Kober
2019 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)  
Some imitation learning approaches rely on Inverse Reinforcement Learning (IRL) methods, to decode and generalize implicit goals given by expert demonstrations.  ...  Experimental results show that the learned reward functions obtain similar performance in RL processes compared to engineered reward functions used as baseline, both in simulated and real environments.  ...  In Algorithm 3, the steps that integrate the incremental process of learning policies and objective functions with vague human corrections is listed.  ... 
doi:10.1109/aim.2019.8868805 dblp:conf/aimech/CeleminK19 fatcat:x7dhf4rtaffefk4waqnpdlibxe

Reinforcement Learning With Human Advice: A Survey

Anis Najar, Mohamed Chetouani
2021 Frontiers in Robotics and AI  
In this paper, we provide an overview of the existing methods for integrating human advice into a reinforcement learning process.  ...  Finally, we review different approaches for integrating advice into the learning process.  ...  ACKNOWLEDGMENTS This work was supported by the Romeo2 project. This manuscript has been released as a pre-print at arXiv (Najar and Chetouani, 2020) .  ... 
doi:10.3389/frobt.2021.584075 pmid:34141726 pmcid:PMC8205518 fatcat:fqipip7cp5hvlo22xsonqeqzcq

Reinforcement learning with human advice: a survey [article]

Anis Najar, Mohamed Chetouani
2020 arXiv   pre-print
In this paper, we provide an overview of the existing methods for integrating human advice into a Reinforcement Learning process.  ...  Finally, we review different approaches for integrating advice into the learning process.  ...  Acknowledgments This work was supported by the Romeo2 project.  ... 
arXiv:2005.11016v2 fatcat:kvomaemvrzfq3lebfewnn4rdqq

Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks

Stefanos Nikolaidis, Ramya Ramakrishnan, Keren Gu, Julie Shah
2015 Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction - HRI '15  
These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm.  ...  of this new user and will be robust to deviations of the human actions from prior demonstrations.  ...  We posited that the reward function learned by the proposed framework for each cluster of demonstrated action sequences would accurately represent the goals of participants with that preference, and would  ... 
doi:10.1145/2696454.2696455 dblp:conf/hri/NikolaidisRGS15 fatcat:yuzba4bg3zhv3jn3yzm6mgh6ve

Towards Intrinsic Interactive Reinforcement Learning [article]

Benjamin Poole, Minwoo Lee
2022 arXiv   pre-print
With the rising interest in human-in-the-loop (HITL) applications, RL algorithms have been adapted to account for human guidance giving rise to the sub-field of interactive reinforcement learning (IRL)  ...  These two ideas have set RL and BCI on a collision course for one another through the integration of BCI into the IRL framework where intrinsic feedback can be utilized to help train an agent.  ...  Millán, Brad Knox, and Peter Stone for taking the time to provide valuable feedback.  ... 
arXiv:2112.01575v2 fatcat:orpakgcvbrdh5ddhpmpbtxneau

Learning from Richer Human Guidance

Chandrayee Basu, Mukesh Singhal, Anca D. Dragan
2018 Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction - HRI '18  
We focus on learning the desired objective function for a robot. Although trajectory demonstrations can be very informative of the desired objective, they can also be difficult for users to provide.  ...  We focus on augmenting comparisons with feature queries, and introduce a unified formalism for treating all answers as observations about the true desired reward.  ...  This is evidenced by them preferring the robot that optimizes the reward function learned through rich queries over the robot that optimizes the reward learned through comparison-only queries.  ... 
doi:10.1145/3171221.3171284 dblp:conf/hri/BasuSD18 fatcat:afot7cclebgsvfnccewmxtthaq

Comparative Analysis of Human Movement Prediction: Space Syntax and Inverse Reinforcement Learning [article]

Soma Suzuki
2018 arXiv   pre-print
In this paper, comparative analysis of space syntax metrics and maximum entropy inverse reinforcement learning (MEIRL) is performed.  ...  Space syntax matrix has been the main approach for human movement prediction in the urban environment.  ...  The acquired reward function explains the intrinsic preference of policy demonstrator, or expert agents.  ... 
arXiv:1801.00464v2 fatcat:w4nuizhdabatndwf2lmc5qcoqi

Co-active Learning to Adapt Humanoid Movement for Manipulation [article]

Ren Mao, John S. Baras, Yezhou Yang, Cornelia Fermuller
2016 arXiv   pre-print
The framework also considers user's feedback towards the adapted trajectories, and it learns to adapt movement through human-in-the-loop interactions.  ...  It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints.  ...  Although this reward function can be recovered from demonstrations by Inverse Optimal Control, as [13] suggests, it assumes that demonstrations are from experts, which bears an oracle reward function  ... 
arXiv:1609.03628v1 fatcat:2dulshpiuvhinn7lhi4ibnw26q
« Previous Showing results 1 — 15 out of 121,286 results