578,389 Hits in 2.6 sec

Active Reward Learning

Christian Daniel, Malte Viering, Jan Metz, Oliver Kroemer, Jan Peters
2014 Robotics: Science and Systems X  
Instead, we propose to learn the reward function through active learning, querying human expert knowledge for a subset of the agent's rollouts.  ...  We introduce a framework, wherein a traditional learning algorithm interplays with the reward learning component, such that the evolution of the action learner guides the queries of the reward learner.  ...  ) Query always No Reward Learning Semi Supervised Reward Learning Supervised Reward Learning TABLE I : I We show the algorithmic form of active reward learning TABLE II : II Comparison of  ... 
doi:10.15607/rss.2014.x.031 dblp:conf/rss/DanielVMK014 fatcat:nhxefkdrqjebjh3mkmzkh4vosq

Dopamine, reward learning, and active inference

Thomas H. B. FitzGerald, Raymond J. Dolan, Karl Friston
2015 Frontiers in Computational Neuroscience  
Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning.  ...  Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states  ...  The first concerns the acquisition of reward contingencies, known to induce characteristic changes in phasic activity in the dopaminergic system such that, over the course of learning, responses to rewarding  ... 
doi:10.3389/fncom.2015.00136 pmid:26581305 pmcid:PMC4631836 fatcat:kp26otp4x5eddmrgh4fb72v5aq

Prior Preference Learning from Experts:Designing a Reward with Active Inference [article]

Jin young Shin, Cheolhyeong Kim, Hyung Ju Hwang
2021 arXiv   pre-print
Experimental results of prior preference learning show the possibility of active inference with EFE-based rewards and its application to an inverse RL problem.  ...  In this paper, we claim that active inference can be interpreted using reinforcement learning (RL) algorithms and find a theoretical connection between them.  ...  Therefore, learning the expert as a prior preference p immediately constructs its active inference based reward function and thereby applicable to a reward construction problem and several inverse Learning  ... 
arXiv:2101.08937v3 fatcat:64qpdz7ydzhxvjmkx66exqyd5i

Active Reinforcement Learning: Observing Rewards at a Cost [article]

David Krueger, Jan Leike, Owain Evans, John Salvatier
2020 arXiv   pre-print
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.  ...  The central question of ARL is how to quantify the long-term value of reward information.  ...  In analogy with active learning, an ARL agent chooses when to observe the reward signal.  ... 
arXiv:2011.06709v2 fatcat:x5sce4tcavhrjjtqjybapmkeru

Active Learning for Reward Estimation in Inverse Reinforcement Learning [chapter]

Manuel Lopes, Francisco Melo, Luis Montesano
2009 Lecture Notes in Computer Science  
In this paper, we introduce active learning for inverse reinforcement learning.  ...  Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator.  ...  Active Learning for Reward Estimation In the previous section we discussed two possible (Bayesian) approaches to the IRL problem.  ... 
doi:10.1007/978-3-642-04174-7_3 fatcat:ey3tls7my5hc7jwspib5mmm4fa

Active Preference-Based Learning of Reward Functions

Dorsa Sadigh, Anca Dragan, Shankar Sastry, Sanjit Seshia
2017 Robotics: Science and Systems XIII  
We thus take an active learning approach, in which the system decides on what preference queries to make.  ...  Second, the learned reward function strongly depends on what environments and trajectories were experienced during the training phase.  ...  The reward function learned through our algorithm is closer to the true reward compared to the non-active baseline. H2.  ... 
doi:10.15607/rss.2017.xiii.053 dblp:conf/rss/SadighDSS17 fatcat:lofk535fjngdfodarjjmy547ry

Active Preference-Based Gaussian Process Regression for Reward Learning [article]

Erdem Bıyık, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
2020 arXiv   pre-print
Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences.  ...  One common approach is to learn reward functions from collected expert demonstrations.  ...  ACKNOWLEDGMENTS We thank Farid Soroush for the early discussions on active preference-based GP regression; Sydney M.  ... 
arXiv:2005.02575v2 fatcat:q6wl6jrkmrahbna3ysipcfg3cm

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples [article]

Zhe Xu, Bo Wu, Aditya Ojha, Daniel Neider, Ufuk Topcu
2021 arXiv   pre-print
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.  ...  We compare our algorithm with the state-of-the-art RL algorithms for non-Markovian reward functions, such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning Reward Machine (LRM)  ...  Active Reinforcement Learning Engine In this subsection, we introduce the active reinforcement learning engine.  ... 
arXiv:2006.15714v4 fatcat:6hmukogdlnbernfi36bvl4kwbm

Batch Active Preference-Based Learning of Reward Functions [article]

Erdem Bıyık, Dorsa Sadigh
2018 arXiv   pre-print
In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short  ...  While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions.  ...  Active reinforcement learning: Observing rewards at a cost. In Future of Interactive Learning Machines, NIPS Workshop, 2016. [28] G. Andrew and J. Gao.  ... 
arXiv:1810.04303v1 fatcat:polkkouvsrhgnjqewkaxqjyghu

APReL: A Library for Active Preference-based Reward Learning Algorithms [article]

Erdem Bıyık, Aditi Talati, Dorsa Sadigh
2022 arXiv   pre-print
In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop  ...  Many preference-based learning algorithms and active querying techniques have been proposed as a solution to this problem.  ...  Active Preference-Based Learning. Active preferencebased reward learning is a well-studied problem in machine learning and robotics.  ... 
arXiv:2108.07259v2 fatcat:lbvicxy2l5fxxitliuk5y2fh24

Reward-Motivated Learning: Mesolimbic Activation Precedes Memory Formation

R. Alison Adcock, Arul Thangavel, Susan Whitfield-Gabrieli, Brian Knutson, John D.E. Gabrieli
2006 Neuron  
The findings are consistent with the hypothesis that reward motivation promotes memory formation via dopamine release in the hippocampus prior to learning.  ...  In the encoding task, high-reward cues preceding remembered but not forgotten scenes activated the ventral tegmental area, nucleus accumbens, and hippocampus.  ...  Discussion The present findings identify a neural system that supports motivated learning, promoting memory formation prior to learning on the basis of anticipated reward.  ... 
doi:10.1016/j.neuron.2006.03.036 pmid:16675403 fatcat:rsu7qn2divgabikpt5udl6lpia

Computer-Assisted Fraud Detection, From Active Learning to Reward Maximization [article]

Christelle Marfaing, Alexandre Garcia
2018 arXiv   pre-print
Due to the availability of a human feedback, this task has been studied in the framework of active learning: the fraud predictor is allowed to sequentially call on an oracle.  ...  The active learning is run during the entire experiment. In this experiment, we empirically show that active learning methods do not maximize the cumulated reward we defined. 2.  ...  . • albl (Active Learning By Learning): A multi-armed bandit chooses among multiple active learning strategies at each time step in order to maximise an expected cumulated reward which is a weighted accuracy  ... 
arXiv:1811.08212v1 fatcat:gzgetwhvdzcxpbkx47xrvlnttu

From lecture to active learning: Rewards for all, and is it really so difficult? [article]

David Pengelley
2019 arXiv   pre-print
We describe the evolution of a personal non-lecture active learning pedagogy developed in numerous courses at all university levels.  ...  We discuss challenges, rewards, and buy-in for both students and instructors.  ...  The main message is that it needn't be difficult to create active learning for your students, and that there are tremendous rewards for the instructor as well as for students.  ... 
arXiv:1908.02389v1 fatcat:yzq74kwxxbhjveesfldyyvrmvi

Correlates of reward-predictive value in learning-related hippocampal neural activity

Murat Okatan
2009 Hippocampus  
Here, the predictive value signal is used to explain the time course of learning-related changes in the activity of hippocampal neurons in monkeys performing an associative learning task.  ...  Two learning signals that are derived from this algorithm, the predictive value and the prediction error, have been shown to explain changes in neural activity and behavior during learning across species  ...  Learning-related hippocampal neural activity and reward-predictive value.  ... 
doi:10.1002/hipo.20535 pmid:19123250 pmcid:PMC2742500 fatcat:cy66ghaurjasfdzlsys6jqq5gm

Active Reward Learning for Co-Robotic Vision Based Exploration in Bandwidth Limited Environments [article]

Stewart Jamieson, Jonathan P. How, Yogesh Girdhar
2020 arXiv   pre-print
We introduce a novel active reward learning strategy based on making queries to help the robot minimize path "regret" online, and evaluate it for suitability in autonomous visual exploration through simulations  ...  We demonstrate that, in some bandwidth-limited environments, this novel regret-based criterion enables the robotic explorer to collect up to 17% more reward per mission than the next-best criterion.  ...  This is a unique challenge for active learning. IV. ONLINE ACTIVE REWARD LEARNING FOR POMDPS Here we will consider active learning strategies to learn the parameters of a POMDP reward model online.  ... 
arXiv:2003.05016v1 fatcat:wsetdojquvapnb2rdi6ktjmuie
« Previous Showing results 1 — 15 out of 578,389 results