3,748 Hits in 5.5 sec

Recursive Least Squares Advantage Actor-Critic Algorithms [article]

Yuan Wang, Chunyuan Zhang, Tianzong Yu, Meng Ma
2022 arXiv   pre-print
In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their  ...  As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency  ...  In LFARL, traditional actor-critic algorithms usually use the recursive least squares (RLS) method to improve their convergence performance.  ... 
arXiv:2201.05918v2 fatcat:kbfplafwebhyhf45d6hwnk4sym

An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm [chapter]

Jooyoung Park, Jongho Kim, Daesung Kang
2005 Lecture Notes in Computer Science  
This paper studies an actor-critic type algorithm utilizing the RLS(recursive least-squares) method, which is one of the most efficient techniques for adaptive signal processing, together with natural  ...  In the actor part of the studied algorithm, we follow the strategy of performing parameter update via the natural gradient method, while in its update for the critic part, the recursive least-squares method  ...  gradient and the recursive least-squares method.  ... 
doi:10.1007/11596448_9 fatcat:fgt6o5ib5nfbxbewijp2p64lta

Natural actor–critic algorithms

Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee
2009 Automatica  
We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs.  ...  Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.  ...  Convergence Analysis for Algorithm 3 As stated previously, the main idea in this algorithm is to minimize the least squares error in estimating the advantage function via function approximation.  ... 
doi:10.1016/j.automatica.2009.07.008 fatcat:svrhbszdwngdpa5bnx6okxwdy4

Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic Approach

Ahmad Fares, Ahmad Bani Younes
2020 Applied Sciences  
The Temporal Difference (TD) advantage actor critic algorithm is used with the appropriate reward function.  ...  Least Squares method (RLS).  ...  TD Advantage Actor Critic Algorithm The TD advantage actor critic algorithm is an online model-free algorithm that consists of two neural networks, the actor network π θ (s) and the critic network V U  ... 
doi:10.3390/app10228060 fatcat:t3a2sa527ffonophc2u3y5f5ye

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

L. A. Prashanth, Mohammad Ghavamzadeh
2016 Machine Learning  
We then devise actor-critic algorithms that operate on three timescales-a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers  ...  We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.  ...  actor-critic algorithm.  ... 
doi:10.1007/s10994-016-5569-5 fatcat:famcrbft2jalxjyqsyg5h5xtfa

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs [article]

Prashanth L.A., Mohammad Ghavamzadeh
2015 arXiv   pre-print
We then devise actor-critic algorithms that operate on three timescales - a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers  ...  We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.  ...  [59] showed the advantages of using these features in approximating the action-value function in actor-critic algorithms.  ... 
arXiv:1403.6530v2 fatcat:pnmnlbabqzhk3k74k7h5uj5mp4

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, Robert Babuska
2012 IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)  
into many actor-critic algorithms in the past few years.  ...  Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework.  ...  The episodic Natural Actor-Critic method in [16] applied to an anthropomorphic robot arm performing a baseball bat swing task. using a recursive least-squares natural actor-critic method in [60] .  ... 
doi:10.1109/tsmcc.2012.2218595 fatcat:4ecxuz34urddhktvafndxrncgq

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

X. Xu, H. He, D. Hu
2002 The Journal of Artificial Intelligence Research  
The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control.  ...  The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method.  ...  Littman for their insights and constructive criticisms, which have helped improve the paper significantly.  ... 
doi:10.1613/jair.946 fatcat:noehlqv3pje3tf2jjhgya6vdwu

Adaptive optimal control for continuous-time linear systems based on policy iteration

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, F.L. Lewis
2009 Automatica  
In this paper we propose a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge  ...  The effectiveness of the algorithm is shown while finding the optimal-load-frequency controller for a power system.  ...  Alternatively, the solution given by (22) can be obtained also using recursive estimation algorithms (e.g. gradient descent algorithms or the Recursive Least Squares algorithm) in which case a persistence  ... 
doi:10.1016/j.automatica.2008.08.017 fatcat:fygohn5u4ncujceiharry6lbpu

Revisiting Natural Actor-Critics with Value Function Approximation [chapter]

Matthieu Geist, Olivier Pietquin
2010 Lecture Notes in Computer Science  
Consequently, derivation of actor-critic algorithms is not straightforward.  ...  Consequently, new forms of critics can easily be integrated in the actor-critic framework.  ...  Actually, [5] introduced a natural actor-critic based on the Least-Squares Temporal Differences (LSTD) algorithm of [13] .  ... 
doi:10.1007/978-3-642-16292-3_21 fatcat:q3kdinyxfrf7fdonjv4ov3lyqi

Convergence Results for Some Temporal Difference Methods Based on Least Squares

Huizhen Yu, D.P. Bertsekas
2009 IEEE Transactions on Automatic Control  
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE( ).  ...  Actor-critic algorithms are two-time-scale SA algorithms in which the actor part refers to stochastic gradient descent iterations on the space of policy parameters at the slow time-scale, while the critic  ...  The two algorithms are called the least squares temporal difference algorithm, LSTD( ), first proposed by Bradtke and Barto [10] for and generalized by Boyan [11] to , and the least squares policy  ... 
doi:10.1109/tac.2009.2022097 fatcat:nkndl35ihjaqjkzsdrnjedbpqi

TD-regularized actor-critic methods

Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan
2019 Machine Learning  
The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods.  ...  This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate.  ...  In both algorithms, both the actor and the critic gradients are optimized by ADAM (Kingma and Ba 2014) . For the policy update, both the advantage and the TD error estimates are standardized.  ... 
doi:10.1007/s10994-019-05788-0 fatcat:osifv5utpnft5kjlmh2xfnxktu

Multi-agent Natural Actor-critic Reinforcement Learning Algorithms [article]

Prashant Trivedi, Nandyala Hemachandra
2022 arXiv   pre-print
Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work.  ...  Using this, we theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC)  ...  We can tune w i in such a way that the estimate of least squared error in linear function approximation of advantage function is minimized, i.e., E π θ (w i ) = 1 2 s∈S,a i ∈A i dθ (s, a i )[w i ψ i (s  ... 
arXiv:2109.01654v3 fatcat:daq6vwi7nvg3dd7lnhup6vgc5q

Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems

Draguna Vrabie, Frank Lewis
2009 Neural Networks  
The algorithm is based on a reinforcement learning scheme, namely Policy Iterations, and makes use of neural networks, in an Actor/Critic structure, to parametrically represent the control policy and the  ...  The algorithm converges online to the optimal control solution without knowledge of the internal system dynamics. Closed-loop dynamic stability is guaranteed throughout.  ...  One should note that the least-squares method for finding the parameters of the cost function can be replaced with any other suitable, recursive or not recursive, method of parameter identification.  ... 
doi:10.1016/j.neunet.2009.03.008 pmid:19362449 fatcat:6w4v2h4idbf6tcgnbun7byqybq

Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning
액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어

Young-Joon Moon, Jae-Hoon Lee, Joo-Young Park
2009 Journal of Korean institute of intelligent systems  
algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy.  ...  In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory  ...  본 논문에서는 연속적인 고차원 공간에 강화학습을 적용 하는 것을 목표로 하여 액터-크리틱 방법(actor-critic method), 퍼지 이론, RLS(recursive least-squares) 필터 등을 종합적으로 사용하는 방법론을 고려한다.  ... 
doi:10.5391/jkiis.2009.19.4.519 fatcat:kjm675hmrbh35iejbyhlofxuqq
« Previous Showing results 1 — 15 out of 3,748 results