A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning
[article]
2020
arXiv
pre-print
To improve the sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. ...
TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that occurred at different previous times ...
· ∇ θ ln(π(A|S; θ))// accumulating traces for z θ w = w + α w δ · z w θ = θ + α θ δ · z θ I = I · γ(S ) S = S Chapter 3
Sample Efficiency of Temporal
Difference Learning Learning faster and more accurately ...
arXiv:2006.08906v1
fatcat:z4vsafqmm5dqlluues7bv26buy
Reinforcement Learning and its Connections with Neuroscience and Psychology
[article]
2021
arXiv
pre-print
In this paper, we comprehensively review a large number of findings in both neuroscience and psychology that evidence reinforcement learning as a promising candidate for modeling learning and decision ...
While there certainly has been considerable independent innovation to produce such results, many core ideas in reinforcement learning are inspired by phenomena in animal learning, psychology and neuroscience ...
in sample efficiency and robustness ...
arXiv:2007.01099v5
fatcat:mjpkztlmqnfjba3dtcwqwmmlvu
META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
[article]
2020
arXiv
pre-print
For better sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. ...
TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred ...
We are grateful to Compute Canada for providing a shared cluster for experimentation. ...
arXiv:1904.11439v6
fatcat:ivaaiiqsx5dbrjo3wcfljb2amm
Learning to learn online with neuromodulated synaptic plasticity in spiking neural networks
[article]
2022
arXiv
pre-print
We propose that in order to harness our understanding of neuroscience toward machine learning, we must first have powerful tools for training brain-like models of learning. ...
with a framework of learning to learn through gradient descent to address challenging online learning problems. ...
Acknowledgments The program is funded by Office of the Under Secretary of Defense (OUSD) through the Applied Research for Advancement of S&T Priorities (ARAP) Program work unit 1U64. ...
arXiv:2206.12520v2
fatcat:tqohncoyvrdf5n7xumenrwlwle
TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent
[article]
2018
arXiv
pre-print
In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. ...
Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. ...
Step-size adaptation in temporal-difference learning The problem of how to set step-sizes automatically is an important one for machine learning. ...
arXiv:1804.03334v1
fatcat:vspu4e3mg5dw3okjbfdj2mybie
A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
[article]
2016
arXiv
pre-print
There are no meta-learning method for λ that can achieve (1) incremental updating, (2) compatibility with function approximation, and (3) maintain stability of learning under both on and off-policy sampling ...
For temporal-difference learning algorithms which we study here, there is yet another parameter, λ, that similarly impacts learning speed and stability in practice. ...
Acknowledgements We would like to thank David Silver for helpful discussions and the reviewers for helpful comments. ...
arXiv:1607.00446v2
fatcat:dvvdietczjaslc4jjurey55fum
Meta-Gradient Reinforcement Learning
[article]
2018
arXiv
pre-print
Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. ...
We discuss a gradient-based meta-learning algorithm that is able to adapt the nature of the return, online, whilst interacting and learning from the environment. ...
for their suggestions and comments on an early version of the paper. ...
arXiv:1805.09801v1
fatcat:mls5nqcgprbcpkdazc7fmnsuk4
Off-policy Learning with Eligibility Traces: A Survey
[article]
2013
arXiv
pre-print
Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. ...
In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated ...
Given samples, well-known methods for estimating a value function are temporal difference (TD) learning and Monte Carlo (Sutton and Barto, 1998) . ...
arXiv:1304.3999v1
fatcat:kagkj4bs7vd7nkt37nprlyogz4
Evolving interpretable plasticity for spiking networks
2021
eLife
We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning ...
How these changes can be mathematically described at the phenomenological level, as so-called 'plasticity rules', is essential both for understanding biological information processing and for developing ...
the individual components of the eligibility trace. ...
doi:10.7554/elife.66273
pmid:34709176
pmcid:PMC8553337
fatcat:ekwsjg3gcndzbcgbuzb6y32q3y
Brain-inspired global-local learning incorporated with neuromorphic computing
[article]
2021
arXiv
pre-print
It can meta-learn local plasticity and receive top-down supervision information for multiscale synergic learning. ...
We demonstrate the advantages of this model in multiple different tasks, including few-shot learning, continual learning, and fault-tolerance learning in neuromorphic vision sensors. ...
Data and code availability All data used in this paper are publicly available and can be accessed at http://yann.lecun.com/exdb/mnist/ for the MNIST dataset, https://www.cs.toronto.edu/~kriz/cifar/ ...
arXiv:2006.03226v3
fatcat:rpx3rt56lzbzrhffdcipfxtuji
Online Off-policy Prediction
[article]
2018
arXiv
pre-print
The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrapping, off-policy sampling and function approximation may cause the value ...
for decades. ...
Sample efficient actor-critic with experience replay. ArXiv:1611.01224.
Yu, H. (2015). On convergence of emphatic temporal-difference learning. ...
arXiv:1811.02597v1
fatcat:qqkbocmp2bbjxlcb5r5wbou3vq
Selective Credit Assignment
[article]
2022
arXiv
pre-print
We describe a unified view on temporal-difference algorithms for selective credit assignment. These selective algorithms apply weightings to quantify the contribution of learning updates. ...
Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings. ...
The Q-learning algorithms illustrated in Fig. 1 use a form of temporal difference (TD) learning (Sutton, 1988a) to learn predictions online from sampled experience by bootstrapping on other predictions ...
arXiv:2202.09699v1
fatcat:26zcp3tku5hqfmhtiuojcjxw4a
Darwinian embodied evolution of the learning ability for survival
2011
Adaptive Behavior
Q-learning is more difficult than Sarsa to combine with eligibility traces, because the learned policy, the greedy policy, is different than the policy used for selecting actions. ...
Examples of this approach are the Dyna algorithm by Sutton (1990) , and Prioritized Sweeping by Moore and Atkeson (1993)
Eligibility Traces Eligibility traces is a basic mechanism for temporal credit ...
doi:10.1177/1059712310397633
fatcat:2r5mx4nh3rdvliamrqo5ve6ttq
One-shot learning with spiking neural networks
[article]
2020
bioRxiv
pre-print
in RSNNs for large families of learning tasks. ...
The same learning approach also supports fast spike-based learning of posterior probabilities of potential input sources, thereby providing a new basis for probabilistic reasoning in RSNNs. ...
We would like to thank Sandra Diaz from the SimLab at the FZ Jülich for enabling the use of CSCS. ...
doi:10.1101/2020.06.17.156513
fatcat:q2iim666rvclbpfa2ngjuyuppe
Deep Reinforcement Learning Overview of the state of the Art
2018
Journal of Automation, Mobile Robotics & Intelligent Systems
Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional ...
In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence. ...
more efficient way. ...
doi:10.14313/jamris_3-2018/15
fatcat:wn5i7y7tgfhvnhz3u5xkqlgvpe
« Previous
Showing results 1 — 15 out of 3,657 results