Filters








27,076 Hits in 7.7 sec

Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning [chapter]

Laëtitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat
2006 Lecture Notes in Computer Science  
We develop a theoretical study and also provide experimental justifications for choosing on the one hand the reward function, and on the other hand particular initial Q-values based on a goal bias function  ...  Indeed, although RL convergence properties have been widely studied, no precise rules exist to correctly choose the reward function and initial Q-values.  ...  On the right, gridworld experiments with different Table 1 . 1 Better choices of reward function and initial Q-values for goal-directed RL.  ... 
doi:10.1007/11840817_87 fatcat:2z6knufbh5aszowb5xzvmc32zm

Value-Added Chemical Discovery Using Reinforcement Learning [article]

Peihong Jiang, Hieu Doan, Sandeep Madireddy, Rajeev Surendran Assary, Prasanna Balaprakash
2019 arXiv   pre-print
With a more versatile formulation of the problem as a Markov decision process, we address the problem using deep reinforcement learning techniques and present promising preliminary results.  ...  Finding viable and short pathways from sugar molecules to value-added chemicals can be modeled as a retrosynthesis planning problem with a catalyst allowed.  ...  We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.  ... 
arXiv:1911.07630v1 fatcat:y5crvdqba5ht5o7nq72ukbusoe

Automatic Curriculum Learning through Value Disagreement [article]

Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto
2020 arXiv   pre-print
Continually solving new, unsolved tasks is the key to learning diverse behaviors. Through reinforcement learning (RL), we have made massive strides towards solving tasks that have a single goal.  ...  When biological agents learn, there is often an organized and meaningful order to which learning happens.  ...  Acknowledgements: We gratefully acknowledge the support Berkeley DeepDrive, NSF, and the ONR Pecase award. We also thank AWS for computational resources.  ... 
arXiv:2006.09641v1 fatcat:eozppmwgtjgs7i35jjvdmgschq

Value-driven Hindsight Modelling [article]

Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess
2020 arXiv   pre-print
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.  ...  Value estimation is a critical component of the reinforcement learning (RL) paradigm.  ...  Acknowledgments and Disclosure of Funding We thank the anonymous reviewers for their useful feedback.  ... 
arXiv:2002.08329v2 fatcat:ut75cfkv6vcipacgxn6wo3acau

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition [article]

Thomas G. Dietterich
1999 arXiv   pre-print
This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function  ...  The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the  ...  I particularly want to thank Eric Chown for encouraging me to study Feudal reinforcement learning, Ron Parr for providing the details of his HAM machines, and Sebastian Thrun encouraging me to write a  ... 
arXiv:cs/9905014v1 fatcat:l2mlu4hr7rdyxeyppttfuf2vra

Transfer Value Iteration Networks [article]

Junyi Shen, Hankz Hankui Zhuo, Jin Xu, Bin Zhong, Sinno Jialin Pan
2019 arXiv   pre-print
Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains.  ...  Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and  ...  ) of Ministry of Education of China, and Guangdong Province Key Laboratory of Big Data Analysis and Processing for the support of this research.  ... 
arXiv:1911.05701v2 fatcat:b4bgpb424bf33hjw7cyyzgryai

Transfer Value Iteration Networks

Junyi Shen, Hankz Hankui Zhuo, Jin Xu, Bin Zhong, Sinno Pan
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains.  ...  Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and  ...  ) of Ministry of Education of China, and Guangdong Province Key Laboratory of Big Data Analysis and Processing for the support of this research.  ... 
doi:10.1609/aaai.v34i04.6022 fatcat:wanhc424efaancb5aa32b2dmwu

Pragmatic-Pedagogic Value Alignment [article]

Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan
2018 arXiv   pre-print
We present a solution to the cooperative inverse reinforcement learning (CIRL) dynamic game based on well-established cognitive models of decision making and theory of mind.  ...  As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem.  ...  Cooperative Inverse Reinforcement Learning (CIRL) formulates value alignment as a two-player game in which a human and a robot share a common reward function, but only the human has knowledge of this reward  ... 
arXiv:1707.06354v2 fatcat:7swgocx7mjdkndep74ckg6kmra

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

T. G. Dietterich
2000 The Journal of Artificial Intelligence Research  
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function  ...  MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton.  ...  I particularly want to thank Eric Chown for encouraging me to study Feudal reinforcement learning, Ron Parr for providing the details of his HAM machines, and Sebastian Thrun for encouraging me to write  ... 
doi:10.1613/jair.639 fatcat:l6w6hgtsjfdcffmg7yykuchfo4

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion [article]

Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
2019 arXiv   pre-print
Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity.  ...  However, this is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will almost always be  ...  Also, we would like to thank Jascha Sohl-Dickstein, Joseph Antognini, Shane Gu, and Samy Bengio for their feedback during the writing process.  ... 
arXiv:1807.01675v2 fatcat:iw5dhdddabfbnj56uahnlwaqzy

Randomized Value Functions via Posterior State-Abstraction Sampling [article]

Dilip Arumugam, Benjamin Van Roy
2021 arXiv   pre-print
State abstraction has been an essential tool for dramatically improving the sample efficiency of reinforcement-learning algorithms.  ...  We introduce a practical algorithm for doing this using two posterior distributions over state abstractions and abstract-state values.  ...  And yet, many reinforcement-learning algorithms make no concerted effort to fully exploit this structure so as to accelerate learning of the optimal policy or value function [Watkins and Dayan, 1992 ,  ... 
arXiv:2010.02383v2 fatcat:3he5uir2krcevihukp4hepk5xe

Self-Consistent Models and Values [article]

Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
2021 arXiv   pre-print
In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent.  ...  Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.  ...  Acknowledgments and Disclosure of Funding We would like to thank Ivo Danihelka, Junhyuk Oh, Iurii Kemaev, and Thomas Hubert for valuable discussions and comments on the manuscript.  ... 
arXiv:2110.12840v1 fatcat:5ott7uqvavhodldt6nimv2ussu

Exploiting Multi-step Sample Trajectories for Approximate Value Iteration [chapter]

Robert Wright, Steven Loscalzo, Philip Dexter, Lei Yu
2013 Lecture Notes in Computer Science  
Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation  ...  Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation  ...  Acknowledgements This work is supported in part by grants from NSF (No. 0855204) and the AFRL Information Directorate's Visiting Faculty Research Program.  ... 
doi:10.1007/978-3-642-40988-2_8 fatcat:aogrl24xnncipf46qttmfyxhtq

The contribution of striatal pseudo-reward prediction errors to value-based decision-making [article]

Ernest Mas-Herrero, Guillaume Sescousse, Roshan Cools, Josep Marco-Pallares
2017 bioRxiv   pre-print
Together, our results indicate that pseudo-rewards generate learning signals in the striatum and subsequently bias choice behavior despite their lack of association with actual reward.  ...  Here we wanted to test the hypothesis that, despite not carrying any rewarding value per se, pseudo-rewards might generate a bias in choice behavior when reward contingencies are not well-known or uncertain  ...  Importantly, HRL includes a second value function tracking the attainment of subgoals/pseudo-rewards. This value function uses PRPEs to reinforce those actions leading to a certain subgoal.  ... 
doi:10.1101/097873 fatcat:rkypwbkrqfa4noxjus2oewmhxq

Affordance as general value function: A computational model [article]

Daniel Graves, Johannes Günther, Jun Luo
2021 arXiv   pre-print
General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment.  ...  A systematic explication of this connection shows that GVFs and especially their deep learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental  ...  and reinforcement learning.  ... 
arXiv:2010.14289v3 fatcat:b5rj7mxw2vgqzgunlgvhs4sjzi
« Previous Showing results 1 — 15 out of 27,076 results