A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Reinforcement learning of multiple tasks using parametric bias
2009
2009 International Joint Conference on Neural Networks
While exploring a task, the agent builds its internal model of the environment and approximates a state value function. ...
We propose a reinforcement learning system designed to learn multiple different continuous state-action-space tasks. ...
DISCUSSION The method described in this paper is capable of learning multiple continuous state space reinforcement learning tasks and generates good approximations of the optimal value function using very ...
doi:10.1109/ijcnn.2009.5178868
dblp:conf/ijcnn/RybickiST09
fatcat:hh456v66tzbk3j2fgj2nx2t3vy
Reinforcement Learning using Augmented Neural Networks
[article]
2018
arXiv
pre-print
Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions. ...
In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation. ...
Using neural networks with global basis functions as an approximation function destroys optimistic initialisation that forms an important exploration strategy in reinforcement learning. ...
arXiv:1806.07692v1
fatcat:fdje4hmowzgexgb34fclhfwvau
Integral Equations and Machine Learning
[article]
2019
arXiv
pre-print
In the light of the recent advances in reinforcement learning for playing games, we investigate the representation of an approximate solution of an integral equation by artificial neural networks and derive ...
As both light transport simulation and reinforcement learning are ruled by the same Fredholm integral equation of the second kind, reinforcement learning techniques may be used for photorealistic image ...
Learning Next Event Estimation Recent research [15, 16] has shown that deep artificial neural networks [3] very successfully can approximate value and policy functions in temporal difference learning ...
arXiv:1712.06115v3
fatcat:co7l5padjbawjmsbgvkyeck2lu
Deep Successor Reinforcement Learning
[article]
2016
arXiv
pre-print
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. ...
In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. ...
This is essential for stable Q-learning with function approximations (see [22] ). ...
arXiv:1606.02396v1
fatcat:st7mhic7azgebcdr75ahac5kpi
Exploration by Distributional Reinforcement Learning
[article]
2018
arXiv
pre-print
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. ...
We show that our proposed framework conceptually unifies multiple previous methods in exploration. ...
Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the ...
arXiv:1805.01907v2
fatcat:o2rssp5tnnhcrjvpx4fiz23upu
Self-Organisation of Generic Policies in Reinforcement Learning
2013
Advances in Artificial Life, ECAL 2013
We propose the use of an exploratory self-organised policy to initialise the parameters of the function approximation in the reinforcement learning policy based on the value function of the exploratory ...
Results show that the initialisation based on the exploratory value function improve the learning speed in the low-dimensional task and that some correlation towards a higher reward can be acquired in ...
The basis for the comparison is the reinforcement learning initialised with random weights in the function approximator. ...
doi:10.7551/978-0-262-31709-2-ch091
dblp:conf/ecal/SmithH13
fatcat:ac3c6atjpfcp5pya3fvyru2vei
Exploration by Distributional Reinforcement Learning
2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. ...
We show that our proposed framework conceptually unifies multiple previous methods in exploration. ...
Following similar ideas of randomized value function, multiple recent works have combined approximate Bayesian inference [Ranganath et al., 2014; Blei et al., 2017] with Q learning and justified the ...
doi:10.24963/ijcai.2018/376
dblp:conf/ijcai/Tang018
fatcat:e6ycjj7njzad5c4d2le6s44smm
Deep Exploration via Randomized Value Functions
[article]
2019
arXiv
pre-print
We study the use of randomized value functions to guide deep exploration in reinforcement learning. ...
We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. ...
, and more broadly, students who participated in Stanford University's 2017 and 2018 offerings of Reinforcement Learning, for feedback and stimulating discussions on this work. ...
arXiv:1703.07608v5
fatcat:vwihhabalzfe3daoek4dih6efu
A hybrid architecture for function approximation
2008
2008 6th IEEE International Conference on Industrial Informatics
Friedman), a standard reinforcement learning benchmark on which several linear function approximators have previously performed poorly. ...
This paper proposes a new approach to build a value function estimation based on a combination of temporaldifferent (TD) and on-line variant of Random Forest (RF). ...
RANDOM FORESTS IN REINFORCEMENT LEARNING In this section we explore the possibility of using our on-line RF algorithm for function approximation. ...
doi:10.1109/indin.2008.4618267
fatcat:mevmnn5ktngkzo3m6c3riyx6qu
Bootstrapped Thompson Sampling and Deep Exploration
[article]
2015
arXiv
pre-print
The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes ...
We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. ...
By contrast, some of the most successful applications of reinforcement learning generalize using nonlinearly parameterized models, like deep neural networks, that approximate the state-action value function ...
arXiv:1507.00300v1
fatcat:c6lyaztjjvhd3jvmc6xfm72yc4
Selective Perception As a Mechanism To Adapt Agents To The Environment: An Evolutionary Approach
2019
IEEE Transactions on Cognitive and Developmental Systems
In most of the implementations of reinforcement learning facing this type of data, approximation is obtained by neural networks and the process of drawing information from data is mediated by a short-term ...
Learning agents may get data ranging on real intervals directly from the environment they interact with, in a process usually time-expensive. ...
Fig. 1 . 1 General learning model architecture including attention focus block: (a) Replay memory; (b) Main learning loop; (d) A block implementing main Q-value function approximator neural network ; ( ...
doi:10.1109/tcds.2019.2896306
fatcat:to5i6cr37nfyzmtuk6bfffmidq
A Unifying Framework for Reinforcement Learning and Planning
[article]
2022
arXiv
pre-print
Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning. ...
Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have ...
Generalization implies that similar inputs (states) to a function will in general also have approximately similar output (policy or value) predictions. ...
arXiv:2006.15009v4
fatcat:5qsancoivncgjcooefg3nmcedi
Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks
[article]
2020
arXiv
pre-print
As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge ...
An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. ...
Acknowledgments The research presented in this paper has been supported by Dyson Technology Ltd. and computation resources were provided by Microsoft Azure. ...
arXiv:1810.02927v2
fatcat:4pixnbkpdvflxeic3ksesl2iqi
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. ...
Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. ...
The RLSVI algorithm in ) approximates the posterior sampling for exploration by using randomized value functions sampled from a posterior distribution. ...
doi:10.1609/aaai.v34i04.6055
fatcat:i44w4jmidrfltnbatyw6no3ie4
Randomized Prior Functions for Deep Reinforcement Learning
[article]
2018
arXiv
pre-print
Dealing with uncertainty is essential for efficient reinforcement learning. ...
We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable 'prior' network to each ensemble member. ...
This paper can be thought of as a specific type of 'deep exploration via randomized value functions', whose line of research has been crucially driven by the contributions of (and conversations with) Benjamin ...
arXiv:1806.03335v2
fatcat:zkly3q224zad5cpqk7esoazr3e
« Previous
Showing results 1 — 15 out of 111,414 results