126,084 Hits in 4.6 sec

Deep reinforcement learning from human preferences [article]

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
2017 arXiv   pre-print
These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.  ...  For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems.  ...  Compared to all prior work, our key contribution is to scale human feedback up to deep reinforcement learning and to learn much more complex behaviors.  ... 
arXiv:1706.03741v3 fatcat:b2phuyaq7fay7chweuqdkbo4ae

Deep Reinforcement Learning Using Neurophysiological Signatures of Interest

Victor Shih, David Jangraw, Sameer Saproo, Paul Sajda
2017 Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction - HRI '17  
We present a study where human neurophysiological signals are used as implicit feedback to alter the behavior of a deep learning based autonomous driving agent in a simulated virtual environment.  ...  INTRODUCTION Deep Reinforcement Learning can be a powerful methodology for mediating interactions between humans and artificial intelligence (AI).  ...  Deep Reinforcement Learning To train the AI agent to navigate the virtual environment, we used a deep reinforcement learning paradigm [3] that optimizes the function for learning the correct action under  ... 
doi:10.1145/3029798.3038399 dblp:conf/hri/ShihJSS17 fatcat:zyvne6bxxzf3rorsqoxwpe7bam

Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds [article]

Zhiyu Lin, Brent Harrison, Aaron Keech, Mark O. Riedl
2021 arXiv   pre-print
We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence  ...  This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment.  ...  Preliminary experiments (in preparation) show that humans prefer giving action advice over critique. This paper looks at incorporating advice from human teachers into deep reinforcement learning.  ... 
arXiv:1709.03969v2 fatcat:inras67hrzdbrbdzbdrmrcpety

Leveraging Human Guidance for Deep Reinforcement Learning Tasks [article]

Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone
2019 arXiv   pre-print
Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment.  ...  Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions.  ...  A portion of this work has taken place in the Learning Agents Research Group (LARG) at UT Austin.  ... 
arXiv:1909.09906v1 fatcat:jprzobqel5cmvczkexojifbfoa

Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach [article]

Huixin Zhan, Feng Tao, Yongcan Cao
2020 arXiv   pre-print
A more practical approach is to replace human demonstrations by human queries, i.e., preference-based reinforcement learning.  ...  To reduce and minimize the need for human queries, we propose a new GAN-assisted human preference-based reinforcement learning approach that uses a generative adversarial network (GAN) to actively learn  ...  Integrating human demonstrations and preferences fits a recent trend of communicating complex personalized objectives to deep learning systems in, e.g., inverse reinforcement learning [6] , [19] , [  ... 
arXiv:2010.07467v1 fatcat:kscg6oykwvdchidrlt4iv5vrqa

Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest [article]

Victor Shih, David C Jangraw, Paul Sajda, Sameer Saproo
2017 arXiv   pre-print
to idiosyncratic human preferences.  ...  However, Human-AI interaction for such AI agents should include additional reinforcement that is implicit and subjective -- e.g. human preferences for certain AI behavior -- in order to adapt the AI behavior  ...  Deep Reinforcement Learning To train the AI agent to navigate the virtual environment, we used a deep reinforcement learning paradigm [1] that optimizes the function for learning the correct action under  ... 
arXiv:1709.04574v1 fatcat:eud2yqbsmnbvvpmiw7htwrs3va

Personalization of Hearing Aid Compression by Human-In-Loop Deep Reinforcement Learning [article]

Nasim Alamdari, Edward Lobarinas, Nasser Kehtarnavaz
2020 arXiv   pre-print
This paper presents a human-in-loop deep reinforcement learning approach that personalizes hearing aid compression to achieve improved hearing perception.  ...  Nearly half of hearing aid users prefer settings that differ from the commonly prescribed settings.  ...  In our case, in order to model and learn hearing preferences via deep reinforcement learning, the listener's preferences are used.  ... 
arXiv:2007.00192v1 fatcat:ujcxho4e5vbsdmzrh4jd5cd5rq

Reward learning from human preferences and demonstrations in Atari [article]

Borja Ibarz and Jan Leike and Tobias Pohlen and Geoffrey Irving and Shane Legg and Dario Amodei
2018 arXiv   pre-print
In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences.  ...  We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games.  ...  Moreover, we thank Elizabeth Barnes for proofreading the paper and Ashwin Kakarla, Ethel Morgan, and Yannis Assael for helping us set up the human experiments.  ... 
arXiv:1811.06521v1 fatcat:pxw5cgmnrbbsxluwlaa4p3ggja


Hugo Scurto, Tiffon Vincent, Bell Jonathan, de Paiva Santana Charles
2022 Zenodo  
Based on technical description of the Co-Explorer, a deep reinforcement learning agent designed to support sonic exploration through positive or negative human feedback, I discuss how deep reinforcement  ...  In this paper, I relate an auto-reflexive analysis of my practice of designing and musicking deep reinforcement learning.  ...  reinforcement learning.  ... 
doi:10.5281/zenodo.6668960 fatcat:jhlqwgi3bzfqrhfddtgczw73sa

Personalization of Hearing Aid Compression by Human-in-the-Loop Deep Reinforcement Learning

N. Alamdari, E. Lobarinas, N. Kehtarnavaz
2020 IEEE Access  
These data demonstrate the proof-of-concept of achieving personalized compression via human-in-the-loop deep reinforcement learning.  ...  This paper presents a human-in-the-loop deep reinforcement learning approach that personalizes hearing aid compression to achieve improved hearing perception.  ...  In our case, in order to model and learn hearing preferences via deep reinforcement learning, the listener's preferences are used.  ... 
doi:10.1109/access.2020.3035728 fatcat:eirku25wu5dfbmucgkbwdeboj4

A Review on Interactive Reinforcement Learning from Human Social Feedback

Jinying Lin, Zhen Ma, Randy Gomez, Keisuke Nakamura, Bo He, Guangliang Li
2020 IEEE Access  
This paper reviews methods for interactive reinforcement learning agent to learn from human social feedback and the ways of delivering feedback.  ...  Interactive reinforcement learning has been developed to speed up the agent's learning and facilitate to learn from ordinary people by allowing them to provide social feedback, e.g, evaluative feedback  ...  We then introduce interactive reinforcement learning, where agent learn from feedback provided by human trainers. A.  ... 
doi:10.1109/access.2020.3006254 fatcat:omtnm6g6lvfenduatelslnc5ce

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Eiji Uchibe
2017 Neural Processing Letters  
This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures.  ...  The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi.  ...  Deep Inverse Reinforcement Learning Bellman Equation for IRL We here show a derivation of the simplified Bellman equation for inverse reinforcement learning. From Eqs.  ... 
doi:10.1007/s11063-017-9702-7 fatcat:7mxcw6qfn5baxfsixek7jfnn3q

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [article]

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
2019 arXiv   pre-print
When combined with deep reinforcement learning, T-REX outperforms state-of-the-art imitation learning and IRL methods on multiple Atari and MuJoCo benchmark tasks and achieves performance that is often  ...  A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.  ...  Ibarz et al. (2018) combine Deep Q-learning from demonstrations and active preference queries (DQfD+A).  ... 
arXiv:1904.06387v5 fatcat:rglnjfhb2zg5reugvmxgv4sofi

Weak Human Preference Supervision For Deep Reinforcement Learning [article]

Zehong Cao, KaiChiu Wong, Chin-Teng Lin
2020 arXiv   pre-print
The current reward learning from human preferences could be used to resolve complex reinforcement learning (RL) tasks without access to a reward function by defining a single fixed preference between pairs  ...  We believe that our naturally inspired human preferences with weakly supervised learning are beneficial for precise reward learning and can be applied to state-of-the-art RL systems, such as human-autonomy  ...  INTRODUCTION Reinforcement learning (RL) [1] has intensively used the reward function to train agent's behaviours for a specified task.  ... 
arXiv:2007.12904v2 fatcat:uo3kz6ezpnhy3if2vtf2kzrqlm

Embodiment Adaptation from Interactive Trajectory Preferences

Michael Walton, Benjamin Migliori, John Reeder
2018 European Conference on Principles of Data Mining and Knowledge Discovery  
Background Recent advances in reinforcement learning (RL) have largely been driven by scaling algorithms well understood in simple task domains to complex, highdimensional problems using deep neural networks  ...  Prior work has also explored imitation learning to improve the sample efficiency of reinforcement learning [3] , [4] .  ... 
dblp:conf/pkdd/WaltonMR18 fatcat:ngpcphyb7rdbvcnchqt5ii4k7a
« Previous Showing results 1 — 15 out of 126,084 results