85,263 Hits in 2.2 sec

Expected Eligibility Traces [article]

Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa
2021 arXiv   pre-print
In this work, we introduce expected eligibility traces.  ...  We discuss when expected traces provide benefits over classic (instantaneous) traces in temporal-difference learning, and show that sometimes substantial improvements can be attained.  ...  We introduce a bootstrapping mechanism that provides a spectrum of algorithms between standard eligibility traces and expected eligibility traces, and also discuss ways to apply these ideas with deep neural  ... 
arXiv:2007.01839v2 fatcat:2yxk7yab6jgxxfefxrd5xkliry

An Improved Sarsa(λ) Reinforcement Learning Algorithm for Wireless Communication Systems

Hao Jiang, Renjie Gui, Zhen Chen, Liang Wu, Jian Dang, Jie Zhou
2019 IEEE Access  
In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces  ...  INDEX TERMS Model-free reinforcement learning, Sarsa, Q learning, eligibility traces. environmental information [5] .  ...  CONCLUSION We provided a novel improved TD control algorithm, namely Expected Sarsa(λ), for wireless communication networks. This algorithm combines Expected Sarsa and eligibility traces.  ... 
doi:10.1109/access.2019.2935255 fatcat:ugvxdekwjvhj7fqto3idebtyjy

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning [article]

Taisuke Kobayashi
2022 arXiv   pre-print
This design allows for the replacement of the most influential adaptively accumulated (decayed) eligibility traces.  ...  The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL.  ...  It is expected that, with the appropriate parameters, the proposed method can achieve the benefits of both the standard and replacing eligibility traces.  ... 
arXiv:2008.10040v2 fatcat:h5eb4f7gpfgyva6bezf6za3thq

Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms [article]

Markus Dumke
2017 arXiv   pre-print
This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning.  ...  Of course Double Q(σ) can also be used with eligibility traces.  ...  These can be extended to use eligibility traces to incorporate data of multiple time steps. An eligibility trace is a scalar numeric value for each state-action pair.  ... 
arXiv:1711.01569v1 fatcat:pf2kzdpfszacti3yk4uuahztwi

Q-Learning With Eligibility Traces To Solve Non-Convex Economic Dispatch Problems

Mohammed I. Abouheaf, Sofie Haesaert, Wei-Jen Lee, Frank L. Lewis
2013 Zenodo  
The eligibility traces are used to speed up the Q-Learning process.  ...  Q-Learning with eligibility traces is used to solve Economic Dispatch problems with valve point loading effect, multiple fuel options, and power transmission losses.  ...  IV. ( ) Q λ LEARNING WITH ELIGIBILITY TRACES ALGORITHM In this section, an algorithm based on Q-Learning with eligibility traces is developed.  ... 
doi:10.5281/zenodo.1088772 fatcat:as7myba56zc73gpi7pns2gx6py

Temporal Second Difference Traces [article]

Mitchell Keith Bloch
2011 arXiv   pre-print
Replacing traces, using a recency heuristic, are more efficient but less reliable.  ...  We introduce both Optimistic Q(\lambda) and the temporal second difference trace (TSDT). TSDT is particularly powerful in deterministic domains.  ...  Beyond One-Step Methods Eligibility Traces Eligibility traces, such as Watkins' Q(λ), are a model-free method for using recent memory to speed reinforcement learning.  ... 
arXiv:1104.4664v1 fatcat:xcadn5morra5hmahqmocp4x25u

Selective Credit Assignment [article]

Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt
2022 arXiv   pre-print
Sparse expected eligibility traces Interestingly, learning expectation models of selective traces (c.f. ( 8 )) with binary weighting functions 𝜔 : S → {0, 1}, results in sparse expected eligibility traces  ...  Expected eligibility traces Expected eligibility trace (ET) algorithms have been introduced for offtrajectory, on-policy value learning, replacing the instantaneous trace e 𝑡 with an estimated expectation  ... 
arXiv:2202.09699v1 fatcat:26zcp3tku5hqfmhtiuojcjxw4a

Distinct Eligibility Traces for LTP and LTD in Cortical Synapses

Kaiwen He, Marco Huertas, Su Z. Hong, XiaoXiu Tie, Johannes W. Hell, Harel Shouval, Alfredo Kirkwood
2015 Neuron  
Here we report the first experimental demonstration of eligibility traces in cortical synapses.  ...  of eligibility traces for LTP and LTD as a plausible synaptic substrate for reward-based learning.  ...  (E-G) Time evolution of LTP-and LTD-promoting eligibility traces corresponding to the same trials as in (B)-(D). Magenta lines are LTP eligibility traces, and blue lines are LTD eligibility traces.  ... 
doi:10.1016/j.neuron.2015.09.037 pmid:26593091 pmcid:PMC4660261 fatcat:nd5oe7ajungv3hcp5d6p5k7dqu

Learning from delayed feedback: neural responses in temporal credit assignment

Matthew M. Walsh, John R. Anderson
2011 Cognitive, Affective, & Behavioral Neuroscience  
This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences.  ...  We found that eligibility traces improved response accuracy in all models.  ...  Eligibility traces Although RL algorithms provide a solution to the temporal credit assignment problem, eligibility traces can greatly improve the efficiency of these algorithms (Sutton & Barto, 1998)  ... 
doi:10.3758/s13415-011-0027-0 pmid:21416212 pmcid:PMC3208325 fatcat:j6j6dka3efd3nnnpwhlb2aahkq

A comparison of eligibility trace and momentum on SARSA in continuous state-and action-space

Barry D. Nichols
2017 2017 9th Computer Science and Electronic Engineering (CEEC)  
Here the Newton's Method direct action selection approach to continuous action-space reinforcement learning is extended to use an eligibility trace.  ...  The eligibility trace approach achieves a higher success rate with a far wider range of parameter values than the momentum approach and also trains in fewer trials on the Cart-Pole problem.  ...  Therefore using an eligibility trace would be expected to give improved results over standard SARSA with a momentum term when updating the ANN.  ... 
doi:10.1109/ceec.2017.8101599 dblp:conf/ceec/Nichols17 fatcat:u2qx3exmtjd3jeu6vl6onlf3z4

Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning

Rupal Rana, Fernando S. Oliveira
2014 Omega : The International Journal of Management Science  
In this article we use two popular methods, Q-learning [16] and the Q-learning with eligibility traces, originally proposed by Peng and Williams [17] .  ...  If iPad 2 is launched and the demand profile is different from iPad, for the same time period, this difference is used by the algorithm to update the expectations about future demand for the product, implicitly  ...  Learning accelerates because of the use of eligibility traces. The eligibility trace methods can strengthen the whole sequence of pricing actions.  ... 
doi:10.1016/ fatcat:hilukzdqdzgj7hpvgfnvoneydi

The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces

Marco A. Huertas, Sarah E. Schwettmann, Harel Z. Shouval
2016 Frontiers in Synaptic Neuroscience  
RESULTS A Model of Stable Reinforcement Learning Based on Competition between Eligibility Traces The CRL is based on the assumption that at every synapse, two synaptic eligibility traces, one for LTP  ...  In such a case at steady state the integral of the LTP eligibility trace times its associated reward magnitude (solid line) is equal to the LTD eligibility trace times its associated reward (dashed line  ... 
doi:10.3389/fnsyn.2016.00037 pmid:28018206 pmcid:PMC5156839 fatcat:lunas5tb3rccrmudv7yc6uy6nm

Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules

Wulfram Gerstner, Marco Lehmann, Vasiliki Liakoni, Dane Corneil, Johanni Brea
2018 Frontiers in Neural Circuits  
Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only  ...  While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few  ...  The expected time scale of the synaptic eligibility trace should roughly match the maximal delay of reinforcers in conditioning experiments (Thorndike, 1911; Pavlov, 1927; Black et al., 1985) , linking  ... 
doi:10.3389/fncir.2018.00053 pmid:30108488 pmcid:PMC6079224 fatcat:lztoa7kc3jeodg6e6fcpj3m2aq

Temporal difference learning with eligibility traces for the game connect four

Markus Thill, Samineh Bagheri, Patrick Koch, Wolfgang Konen
2014 2014 IEEE Conference on Computational Intelligence and Games  
Different versions of eligibility traces (standard, resetting, and replacing traces) are compared.  ...  In this work we study the benefits of eligibility traces added to this system. To the best of our knowledge, eligibility traces have not been used before for such a large system.  ...  Why are eligibility traces better?  ... 
doi:10.1109/cig.2014.6932870 dblp:conf/cig/ThillBKK14 fatcat:hkxl4ck76jcxlgfyftp5l5ztie

One-shot learning and behavioral eligibility traces in sequential decision making [article]

Marco Lehmann, He Xu, Vasiliki Liakoni, Michael Herzog, Wulfram Gerstner, Kerstin Preuschoff
2019 arXiv   pre-print
dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.  ...  Here we asked whether humans use eligibility traces. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward.  ...  RL 'without eligibility trace'. In both classes of algorithms, action biases or values that reflect the expected future reward are assigned to states.  ... 
arXiv:1707.04192v2 fatcat:eine3tythje7ljmeza3aqogjlu
« Previous Showing results 1 — 15 out of 85,263 results