Filters








2,151 Hits in 6.4 sec

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

Carlton Downey, Scott Sanner
2010 International Conference on Machine Learning  
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make efficient use of sampled data.  ...  In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return.  ...  NICTA is funded by the Australian Government's Backing Australia's Ability initiative, and the Australian Research Council's ICT Centre of Excellence program.  ... 
dblp:conf/icml/DowneyS10 fatcat:2dglujeqkzcijlupk6pnk4ok3q

Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

Peter Bossaerts, Shijie Huang, Nitin Yadav
2020 Risks  
We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis.  ...  We show that the resulting "efficient distributional RL" (e-disRL) learns much faster, and is robust once it settles on a policy.  ...  The effect of tail risk is marked under TD Learning, and is still quite noticeable when decoupling the immediate reward in the prediction error while estimating its mean using the sample average of past  ... 
doi:10.3390/risks8040113 fatcat:p56ng4d7wng2dfzni7fkawhneu

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances [article]

Kaiqing Zhang, Zhuoran Yang, Tamer Başar
2019 arXiv   pre-print
Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control.  ...  With the recent development of (single-agent) deep RL, there is a resurgence of interests in developing new MARL algorithms, especially those that are backed by theoretical analysis.  ...  We then establish finite-sample analysis for a decentralized variant of FQI for this setting.  ... 
arXiv:1912.03821v1 fatcat:555igege7balrb3iiavbkcj3dy

Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli
2019 Neural Information Processing Systems  
Leveraging on these tools, we present an algorithm, Wasserstein Q-Learning (WQL), starting in the tabular case and then, we show how it can be extended to deal with continuous domains.  ...  How does the uncertainty of the value function propagate when performing temporal difference learning?  ...  either average reward or finite-horizon setting.  ... 
dblp:conf/nips/MetelliLR19 fatcat:fopzpftyqffy5jhmgej3lr5xgu

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning [article]

Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves
2020 arXiv   pre-print
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a fixed number of future time steps.  ...  We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions.  ...  Acknowledgments The authors thank the Reinforcement Learning and Artificial Intelligence research group, Amii, and the Vector Institute for providing the environment to nurture and support this research  ... 
arXiv:1909.03906v2 fatcat:2d5lijyvrjdnjaevmuqgmxb7ti

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Kristopher De Asis, Alan Chan, Silviu Pitis, Richard Sutton, Daniel Graves
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a fixed number of future time steps.  ...  We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions.  ...  Acknowledgments The authors thank the Reinforcement Learning and Artificial Intelligence research group, Amii, and the Vector Institute for providing the environment to nurture and support this research  ... 
doi:10.1609/aaai.v34i04.5784 fatcat:mj7j5wgfarbl3lrami4yohitmy

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation [article]

Xingang Guo, Bin Hu
2022 arXiv   pre-print
In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized  ...  Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS).  ...  Recently, there has been a growing interest in finite-time analysis of TD learning with linear function approximation in various settings [9] - [13] .  ... 
arXiv:2204.09801v1 fatcat:wtprk434obew3em4hqazgvh57u

Reducing Sampling Error in Batch Temporal Difference Learning [article]

Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
2020 arXiv   pre-print
Finally, we conduct an empirical evaluation of PSEC-TD(0) on three batch value function learning tasks, with a hyperparameter sensitivity analysis, and show that PSEC-TD(0) produces value function estimates  ...  Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning.  ...  Reducing Sampling Error in Batch Temporal Difference Learning The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity  ... 
arXiv:2008.06738v1 fatcat:qpyg7ke7djgwjojoct2rierqou

Average-reward model-free reinforcement learning: a systematic review and literature mapping [article]

Vektor Dewanto, George Dunn, Ali Eshragh, Marcus Gallagher, Fred Roosta
2021 arXiv   pre-print
In this paper, we review model-free reinforcement learning that utilizes the average reward optimality criterion in the infinite horizon setting.  ...  Reinforcement learning is important part of artificial intelligence.  ...  Vektor is supported by the University of Queensland Research Training Scholarship.  ... 
arXiv:2010.08920v2 fatcat:hmjm7djacncc7gh6jqeglm4iri

TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning [article]

Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox
2018 arXiv   pre-print
In this paper, we re-examine the role of TD in modern deep RL, using specially designed environments that control for specific factors that affect performance, such as reward sparsity, reward delay, and  ...  Yet we also find that finite-horizon MC is not inferior to TD, even when rewards are sparse or delayed. This makes MC a viable alternative to TD in deep RL.  ...  ACKNOWLEDGMENTS This project was funded in part by the BrainLinks-BrainTools Cluster of Excellence (DFG EXC 1086) and by the Intel Network on Intelligent Systems.  ... 
arXiv:1806.01175v1 fatcat:bzmsv5xngjg6xidug7qxgv2i6i

Learning and Planning in Average-Reward Markov Decision Processes [article]

Yi Wan, Abhishek Naik, Richard S. Sutton
2021 arXiv   pre-print
We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent  ...  All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward.  ...  For RVI Q-learning and (uncentered) Differential Q-learning, these are corresponding to the parameter settings that resulted in the largest reward rate averaged over the training period (reference state  ... 
arXiv:2006.16318v3 fatcat:7d3nms7oojculfjmuq6orrbzce

Control Theoretic Analysis of Temporal Difference Learning [article]

Donghwan Lee
2022 arXiv   pre-print
The goal of this paper is to investigate a control theoretic analysis of linear stochastic iterative algorithm and temporal difference (TD) learning.  ...  Therefore, the proposed work provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.  ...  In particular, [16] proposed the first finite-time analysis of TDlearning in the pure form, and [17] improved convergence of TD-learning in [16] by considering Markovian sampling and higher-order  ... 
arXiv:2112.14417v4 fatcat:xys63us4mbarfayovikauehx34

Analysis of Temporal Difference Learning: Linear System Approach [article]

Donghwan Lee, Do Wan Kim
2022 arXiv   pre-print
In this paper, we propose a simple control theoretic finite-time analysis of TD-learning, which exploits linear system models and standard notions in linear system communities.  ...  The goal of this technical note is to introduce a new finite-time convergence analysis of temporal difference (TD) learning based on stochastic linear system models.  ...  Related Works 1) Finite-Time Analysis of TD-Learning: Recently, some progresses have been made in finite-time analysis of TD-learning algorithms [15] - [18] .  ... 
arXiv:2204.10479v4 fatcat:up72rxb6erepbpyntrevczrqvq

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik
2018 The Journal of Artificial Intelligence Research  
Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis.  ...  The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.  ... 
doi:10.1613/jair.1.11251 fatcat:axcp56nezbeovooraextarycki

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [article]

Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik
2020 arXiv   pre-print
Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis.  ...  The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.  ... 
arXiv:2006.03976v1 fatcat:btdxyuh3obbq3fluc3vlofb6te
« Previous Showing results 1 — 15 out of 2,151 results