Filters








111,178 Hits in 5.0 sec

Randomized Exploration for Reinforcement Learning with General Value Function Approximation [article]

Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang
2021 arXiv   pre-print
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.  ...  We complement the theory with an empirical evaluation across known difficult exploration tasks.  ...  The work was initiated when HI, ZY, ZW and LY was visiting the Simons Institute for the Theory of Computing at UC-Berkeley (Theory of Reinforcement Learning Program).  ... 
arXiv:2106.07841v2 fatcat:khusudk24besdhqfhvulplsn5q

Reinforcement learning of multiple tasks using parametric bias

Leszek Rybicki, Yuuya Sugita, Jun Tani
2009 2009 International Joint Conference on Neural Networks  
While exploring a task, the agent builds its internal model of the environment and approximates a state value function.  ...  For learning multiple tasks, we use a parametric bias switching mechanism in which the value of the parametric bias layer identifies the task for the agent.  ...  While the agent explores the environment separately for the tasks, we propose that it use a single learning model for value function approximation, state prediction and policy generation.  ... 
doi:10.1109/ijcnn.2009.5178868 dblp:conf/ijcnn/RybickiST09 fatcat:hh456v66tzbk3j2fgj2nx2t3vy

Reinforcement Learning using Augmented Neural Networks [article]

Jack Shannon, Marek Grzes
2018 arXiv   pre-print
Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions.  ...  In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation.  ...  In general, neural networks with global basis functions, e.g., MLPs, are useful in reinforcement learning as they cope with the curse of dimensionality, which is a challenge for local basis functions.  ... 
arXiv:1806.07692v1 fatcat:fdje4hmowzgexgb34fclhfwvau

Deep Successor Reinforcement Learning [article]

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
2016 arXiv   pre-print
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms.  ...  In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework.  ...  This is essential for stable Q-learning with function approximations (see [22] ).  ... 
arXiv:1606.02396v1 fatcat:st7mhic7azgebcdr75ahac5kpi

Integral Equations and Machine Learning [article]

Alexander Keller, Ken Dahm
2019 arXiv   pre-print
a loss function for that purpose.  ...  As both light transport simulation and reinforcement learning are ruled by the same Fredholm integral equation of the second kind, reinforcement learning techniques may be used for photorealistic image  ...  Acknowledgements The authors thank Anton Kaplanyan, Thomas Müller, and Fabrice Rouselle for profound discussions and advice.  ... 
arXiv:1712.06115v3 fatcat:co7l5padjbawjmsbgvkyeck2lu

Exploration by Distributional Reinforcement Learning [article]

Yunhao Tang, Shipra Agrawal
2018 arXiv   pre-print
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.  ...  We show that our proposed framework conceptually unifies multiple previous methods in exploration.  ...  Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the  ... 
arXiv:1805.01907v2 fatcat:o2rssp5tnnhcrjvpx4fiz23upu

Self-Organisation of Generic Policies in Reinforcement Learning

Simón Smith, J. Michael Herrmann
2013 Advances in Artificial Life, ECAL 2013  
We propose the use of an exploratory self-organised policy to initialise the parameters of the function approximation in the reinforcement learning policy based on the value function of the exploratory  ...  Results show that the initialisation based on the exploratory value function improve the learning speed in the low-dimensional task and that some correlation towards a higher reward can be acquired in  ...  The basis for the comparison is the reinforcement learning initialised with random weights in the function approximator.  ... 
doi:10.7551/978-0-262-31709-2-ch091 dblp:conf/ecal/SmithH13 fatcat:ac3c6atjpfcp5pya3fvyru2vei

Exploration by Distributional Reinforcement Learning

Yunhao Tang, Shipra Agrawal
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.  ...  We show that our proposed framework conceptually unifies multiple previous methods in exploration.  ...  Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the  ... 
doi:10.24963/ijcai.2018/376 dblp:conf/ijcai/Tang018 fatcat:e6ycjj7njzad5c4d2le6s44smm

Deep Exploration via Randomized Value Functions [article]

Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen
2019 arXiv   pre-print
We study the use of randomized value functions to guide deep exploration in reinforcement learning.  ...  This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning.  ...  , and more broadly, students who participated in Stanford University's 2017 and 2018 offerings of Reinforcement Learning, for feedback and stimulating discussions on this work.  ... 
arXiv:1703.07608v5 fatcat:vwihhabalzfe3daoek4dih6efu

Bootstrapped Thompson Sampling and Deep Exploration [article]

Ian Osband, Benjamin Van Roy
2015 arXiv   pre-print
The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes  ...  We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling.  ...  By contrast, some of the most successful applications of reinforcement learning generalize using nonlinearly parameterized models, like deep neural networks, that approximate the state-action value function  ... 
arXiv:1507.00300v1 fatcat:c6lyaztjjvhd3jvmc6xfm72yc4

A hybrid architecture for function approximation

Hassab Elgawi
2008 2008 6th IEEE International Conference on Industrial Informatics  
Friedman), a standard reinforcement learning benchmark on which several linear function approximators have previously performed poorly.  ...  This paper proposes a new approach to build a value function estimation based on a combination of temporaldifferent (TD) and on-line variant of Random Forest (RF).  ...  RANDOM FORESTS IN REINFORCEMENT LEARNING In this section we explore the possibility of using our on-line RF algorithm for function approximation.  ... 
doi:10.1109/indin.2008.4618267 fatcat:mevmnn5ktngkzo3m6c3riyx6qu

A Unifying Framework for Reinforcement Learning and Planning [article]

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker
2022 arXiv   pre-print
Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have  ...  Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.  ...  . • Value update with function approximation: The same principles apply for gradientbased updates in the context of function approximation.  ... 
arXiv:2006.15009v4 fatcat:5qsancoivncgjcooefg3nmcedi

Selective Perception As a Mechanism To Adapt Agents To The Environment: An Evolutionary Approach

Mirza Ramicic, Andrea Bonarini
2019 IEEE Transactions on Cognitive and Developmental Systems  
In most of the implementations of reinforcement learning facing this type of data, approximation is obtained by neural networks and the process of drawing information from data is mediated by a short-term  ...  To improve learning and manage these data, approximated models and memory mechanisms are adopted.  ...  Fig. 1 . 1 General learning model architecture including attention focus block: (a) Replay memory; (b) Main learning loop; (d) A block implementing main Q-value function approximator neural network ; (  ... 
doi:10.1109/tcds.2019.2896306 fatcat:to5i6cr37nfyzmtuk6bfffmidq

Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks [article]

Fabio Pardo, Vitaly Levdik, Petar Kormushev
2020 arXiv   pre-print
To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once.  ...  As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge  ...  Exploration with random goals As a simple application case for a Q-map we propose an exploration method that replaces the noisy random actions frequently used to explore in reinforcement learning with  ... 
arXiv:1810.02927v2 fatcat:4pixnbkpdvflxeic3ksesl2iqi

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning.  ...  Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions.  ...  Acknowledgement We thank Benjamin Van Roy, Chengshu Li for the insightful discussions, and Rui Du for comments on the earlier drafts.  ... 
doi:10.1609/aaai.v34i04.6055 fatcat:i44w4jmidrfltnbatyw6no3ie4
« Previous Showing results 1 — 15 out of 111,178 results