Filters








111,414 Hits in 6.0 sec

Reinforcement learning of multiple tasks using parametric bias

Leszek Rybicki, Yuuya Sugita, Jun Tani
2009 2009 International Joint Conference on Neural Networks  
While exploring a task, the agent builds its internal model of the environment and approximates a state value function.  ...  We propose a reinforcement learning system designed to learn multiple different continuous state-action-space tasks.  ...  DISCUSSION The method described in this paper is capable of learning multiple continuous state space reinforcement learning tasks and generates good approximations of the optimal value function using very  ... 
doi:10.1109/ijcnn.2009.5178868 dblp:conf/ijcnn/RybickiST09 fatcat:hh456v66tzbk3j2fgj2nx2t3vy

Reinforcement Learning using Augmented Neural Networks [article]

Jack Shannon, Marek Grzes
2018 arXiv   pre-print
Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions.  ...  In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation.  ...  Using neural networks with global basis functions as an approximation function destroys optimistic initialisation that forms an important exploration strategy in reinforcement learning.  ... 
arXiv:1806.07692v1 fatcat:fdje4hmowzgexgb34fclhfwvau

Integral Equations and Machine Learning [article]

Alexander Keller, Ken Dahm
2019 arXiv   pre-print
In the light of the recent advances in reinforcement learning for playing games, we investigate the representation of an approximate solution of an integral equation by artificial neural networks and derive  ...  As both light transport simulation and reinforcement learning are ruled by the same Fredholm integral equation of the second kind, reinforcement learning techniques may be used for photorealistic image  ...  Learning Next Event Estimation Recent research [15, 16] has shown that deep artificial neural networks [3] very successfully can approximate value and policy functions in temporal difference learning  ... 
arXiv:1712.06115v3 fatcat:co7l5padjbawjmsbgvkyeck2lu

Deep Successor Reinforcement Learning [article]

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
2016 arXiv   pre-print
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms.  ...  In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework.  ...  This is essential for stable Q-learning with function approximations (see [22] ).  ... 
arXiv:1606.02396v1 fatcat:st7mhic7azgebcdr75ahac5kpi

Exploration by Distributional Reinforcement Learning [article]

Yunhao Tang, Shipra Agrawal
2018 arXiv   pre-print
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.  ...  We show that our proposed framework conceptually unifies multiple previous methods in exploration.  ...  Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the  ... 
arXiv:1805.01907v2 fatcat:o2rssp5tnnhcrjvpx4fiz23upu

Self-Organisation of Generic Policies in Reinforcement Learning

Simón Smith, J. Michael Herrmann
2013 Advances in Artificial Life, ECAL 2013  
We propose the use of an exploratory self-organised policy to initialise the parameters of the function approximation in the reinforcement learning policy based on the value function of the exploratory  ...  Results show that the initialisation based on the exploratory value function improve the learning speed in the low-dimensional task and that some correlation towards a higher reward can be acquired in  ...  The basis for the comparison is the reinforcement learning initialised with random weights in the function approximator.  ... 
doi:10.7551/978-0-262-31709-2-ch091 dblp:conf/ecal/SmithH13 fatcat:ac3c6atjpfcp5pya3fvyru2vei

Exploration by Distributional Reinforcement Learning

Yunhao Tang, Shipra Agrawal
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.  ...  We show that our proposed framework conceptually unifies multiple previous methods in exploration.  ...  Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the  ... 
doi:10.24963/ijcai.2018/376 dblp:conf/ijcai/Tang018 fatcat:e6ycjj7njzad5c4d2le6s44smm

Deep Exploration via Randomized Value Functions [article]

Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen
2019 arXiv   pre-print
We study the use of randomized value functions to guide deep exploration in reinforcement learning.  ...  We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies.  ...  , and more broadly, students who participated in Stanford University's 2017 and 2018 offerings of Reinforcement Learning, for feedback and stimulating discussions on this work.  ... 
arXiv:1703.07608v5 fatcat:vwihhabalzfe3daoek4dih6efu

A hybrid architecture for function approximation

Hassab Elgawi
2008 2008 6th IEEE International Conference on Industrial Informatics  
Friedman), a standard reinforcement learning benchmark on which several linear function approximators have previously performed poorly.  ...  This paper proposes a new approach to build a value function estimation based on a combination of temporaldifferent (TD) and on-line variant of Random Forest (RF).  ...  RANDOM FORESTS IN REINFORCEMENT LEARNING In this section we explore the possibility of using our on-line RF algorithm for function approximation.  ... 
doi:10.1109/indin.2008.4618267 fatcat:mevmnn5ktngkzo3m6c3riyx6qu

Bootstrapped Thompson Sampling and Deep Exploration [article]

Ian Osband, Benjamin Van Roy
2015 arXiv   pre-print
The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes  ...  We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling.  ...  By contrast, some of the most successful applications of reinforcement learning generalize using nonlinearly parameterized models, like deep neural networks, that approximate the state-action value function  ... 
arXiv:1507.00300v1 fatcat:c6lyaztjjvhd3jvmc6xfm72yc4

Selective Perception As a Mechanism To Adapt Agents To The Environment: An Evolutionary Approach

Mirza Ramicic, Andrea Bonarini
2019 IEEE Transactions on Cognitive and Developmental Systems  
In most of the implementations of reinforcement learning facing this type of data, approximation is obtained by neural networks and the process of drawing information from data is mediated by a short-term  ...  Learning agents may get data ranging on real intervals directly from the environment they interact with, in a process usually time-expensive.  ...  Fig. 1 . 1 General learning model architecture including attention focus block: (a) Replay memory; (b) Main learning loop; (d) A block implementing main Q-value function approximator neural network ; (  ... 
doi:10.1109/tcds.2019.2896306 fatcat:to5i6cr37nfyzmtuk6bfffmidq

A Unifying Framework for Reinforcement Learning and Planning [article]

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker
2022 arXiv   pre-print
Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.  ...  Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have  ...  Generalization implies that similar inputs (states) to a function will in general also have approximately similar output (policy or value) predictions.  ... 
arXiv:2006.15009v4 fatcat:5qsancoivncgjcooefg3nmcedi

Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks [article]

Fabio Pardo, Vitaly Levdik, Petar Kormushev
2020 arXiv   pre-print
As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge  ...  An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy.  ...  Acknowledgments The research presented in this paper has been supported by Dyson Technology Ltd. and computation resources were provided by Microsoft Azure.  ... 
arXiv:1810.02927v2 fatcat:4pixnbkpdvflxeic3ksesl2iqi

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning.  ...  Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions.  ...  The RLSVI algorithm in ) approximates the posterior sampling for exploration by using randomized value functions sampled from a posterior distribution.  ... 
doi:10.1609/aaai.v34i04.6055 fatcat:i44w4jmidrfltnbatyw6no3ie4

Randomized Prior Functions for Deep Reinforcement Learning [article]

Ian Osband, John Aslanides, Albin Cassirer
2018 arXiv   pre-print
Dealing with uncertainty is essential for efficient reinforcement learning.  ...  We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable 'prior' network to each ensemble member.  ...  This paper can be thought of as a specific type of 'deep exploration via randomized value functions', whose line of research has been crucially driven by the contributions of (and conversations with) Benjamin  ... 
arXiv:1806.03335v2 fatcat:zkly3q224zad5cpqk7esoazr3e
« Previous Showing results 1 — 15 out of 111,414 results