122 Hits in 6.1 sec

Geometric Insights into the Convergence of Nonlinear TD Learning [article]

David Brandfonbrener, Joan Bruna
2020 arXiv   pre-print
More precisely, we consider the expected learning dynamics of the TD(0) algorithm for value estimation.  ...  While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples  ...  We also thank the lab mates, especially Will Whitney, Aaron Zweig, and Min Jae Song, who provided useful discussions and feedback. This work was partially supported by the Alfred P.  ... 
arXiv:1905.12185v4 fatcat:qd5dhbji6ja7fpmbqks435s5pe

A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications

Warren B. Powell, Jun Ma
2011 Journal of Control Theory and Applications  
We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees (with technical assumptions) for both parametric and nonparametric architectures  ...  We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued  ...  Residual gradient algorithm To overcome the instability of Q-learning or value iteration when implemented directly with a general function approximation, residual gradient algorithms, which perform gradient  ... 
doi:10.1007/s11768-011-0313-y fatcat:ea6l7fzscjdbflgrft3b33b7ve

Faster Gradient-TD Algorithms

Leah M Hackman
Gradient-TD methods are a new family of learning algorithms that are stable and convergent under a wider range of conditions than previous reinforcement learning algorithms.  ...  In this thesis, we examine this slowness through on-and off-policy experiments and introduce several variations of existing gradient-TD algorithms in search of faster gradient-TD methods.  ...  These algorithms solve the problem of gradient-TD methods being slower than conventional-TD methods on on-policy problems and show promise in providing faster convergence on off-policy problems.  ... 
doi:10.7939/r3js95 fatcat:l7yxrw764zarrovhg6tvbqjiyq

Beyond Target Networks: Improving Deep Q-learning with Functional Regularization [article]

Alexandre Piché, Valentin Thomas, Joseph Marino, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan
2022 arXiv   pre-print
This leads to a faster yet more stable training method.  ...  We analyze the convergence of our method theoretically and empirically validate our predictions on simple environments as well as on a suite of Atari environments.  ...  While we have a theoretical understanding (Schoknecht & Merke, 2003) of why and how TD(0) may converge faster than its symmetric alternative, residual gradient (Baird, 1995) , this is not the case for  ... 
arXiv:2106.02613v3 fatcat:r55ll26mr5b6xohq4plplx7nci

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning? [article]

Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau
2020 arXiv   pre-print
always be better than SGD.  ...  Our theoretical findings demonstrate that including this additional preconditioning information is, surprisingly, comparable to normal semi-gradient TD if the optimal learning rate is found for both via  ...  Td (0) converges provably faster than the residual gradient algorithm. In Interna- tional conference on machine learning, pp. 680-687, 2003. Sutton, R. S.  ... 
arXiv:2007.02786v1 fatcat:iud2qvbqyzegpa2mkqem4rehji

Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks [article]

Prashanth L.A., Abhranil Chatterjee, Shalabh Bhatnagar
2014 arXiv   pre-print
Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation (SPSA) estimate on the faster timescale, while the Q-value parameter  ...  This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees.  ...  Our algorithms are simple, efficient and in the case of the two-timescale on-policy Q-learning based schemes, also provably convergent.  ... 
arXiv:1312.7292v2 fatcat:ktdruc6fpzerjfalepev576zxm

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks

L. A. Prashanth, Abhranil Chatterjee, Shalabh Bhatnagar
2014 Wireless networks  
Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising  ...  This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees.  ...  Our algorithms are simple, efficient and in the case of the two-timescale onpolicy Q-learning based schemes, also provably convergent.  ... 
doi:10.1007/s11276-014-0762-6 fatcat:5gcavzxh4bempep7x4uty57p5e

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme [article]

K.E. Avrachenkov, V.S. Borkar, H.P. Dolhare, K. Patil
2021 arXiv   pre-print
We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for 'ordinary differential equation') approach and point out certain theoretical issues.  ...  We then propose a modified scheme called Full Gradient DQN (FG-DQN, for short) that has a sound theoretical basis and compare it with the original scheme on sample problems.  ...  Acknowledgement The authors are greatly obliged to Prof  ... 
arXiv:2103.05981v3 fatcat:ejfvc6ps7zdptbwd3awkku4mqq

Asynchronous Approximation of a Single Component of the Solution to a Linear System [article]

Asuman Ozdaglar, Devavrat Shah, Christina Lee Yu
2019 arXiv   pre-print
Our algorithm relies on the Neumann series characterization of the component x_i, and is based on residual updates.  ...  This is equivalent to solving for x_i in x = Gx + z for some G and z such that the spectral radius of G is less than 1.  ...  distributions exhibit faster convergence rates.  ... 
arXiv:1411.2647v4 fatcat:lobd2mcnmrfulbdkwdgava4am4

PID Accelerated Value Iteration Algorithm

Amir Massoud Farahmand, Mohammad Ghavamzadeh
2021 International Conference on Machine Learning  
We present the error dynamics of these variants of VI, and provably (for certain classes of MDPs) and empirically (for more general classes) show that the convergence rate can be significantly improved  ...  The key insight is the realization that the evolution of the value function approximations (V k ) k≥0 in the VI procedure can be seen as a dynamical system.  ...  Acknowledgements We would like to thank the anonymous reviewers for their feedback. AMF acknowledges the funding from the Canada CIFAR AI Chairs program.  ... 
dblp:conf/icml/FarahmandG21 fatcat:3omjpn7pc5b5pcteswcm46jaqm

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [article]

Kaiqing Zhang, Zhuoran Yang, Tamer Başar
2021 arXiv   pre-print
the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc.  ...  Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of  ...  Some other common policy evaluation algorithms with convergence guarantees include gradient TD methods with linear [61, 62, 63] , and nonlinear function approximations [64] .  ... 
arXiv:1911.10635v2 fatcat:ihlhtjlhnrdizbkcfzsnz5urfq

Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence [article]

Yuetian Luo, Anru R. Zhang
2021 arXiv   pre-print
Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first quadratic convergence guarantee of RGN for low-rank tensor estimation under some mild conditions  ...  A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN.  ...  The simulation studies show RGN offers much faster convergence compared to the existing approaches in the literature.  ... 
arXiv:2104.12031v2 fatcat:nnqncngurfg2pc23qlwbvjmafq

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem [article]

Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto
2018 arXiv   pre-print
Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy  ...  For reversible policies, the result can be interpreted as the tension between two terms of the loss function that TD minimises, as recently described by [Ollivier, 2018].  ...  Sutton et al. [2009b,a] introduce two modified algorithms for TD with linear function approximation that provably converge in the off-policy setting.  ... 
arXiv:1807.03064v1 fatcat:to63dgnhnbf47ml5zdmyxqnpxi

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning [article]

Mingde Zhao
2020 arXiv   pre-print
Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn  ...  To improve the sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.  ...  Note that the convergence of linear semi-gradient TD(0) algorithm presented in Algorithm 7 does not follow from general results on SGD but a separate theorem.  ... 
arXiv:2006.08906v1 fatcat:z4vsafqmm5dqlluues7bv26buy

RISE: An Incremental Trust-Region Method for Robust Online Sparse Least-Squares Estimation

David M. Rosen, Michael Kaess, John J. Leonard
2014 IEEE Transactions on robotics  
As a trust-region method, RISE is naturally robust to objective function nonlinearity and numerical ill-conditioning, and is provably globally convergent for a broad class of inferential cost functions  ...  Consequently, RISE maintains the speed of current state-of-the-art online sparse least-squares methods while providing superior reliability.  ...  ACKNOWLEDGMENTS The authors would like to thank F. Dellaert and R. Roberts for the RISE2 implementation in the GTSAM library.  ... 
doi:10.1109/tro.2014.2321852 fatcat:7p2fgpqchbb3fea4l5yct4wyai
« Previous Showing results 1 — 15 out of 122 results