A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Recursive Least Squares Advantage Actor-Critic Algorithms
[article]
2022
arXiv
pre-print
In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their ...
As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency ...
In LFARL, traditional actor-critic algorithms usually use the recursive least squares (RLS) method to improve their convergence performance. ...
arXiv:2201.05918v2
fatcat:kbfplafwebhyhf45d6hwnk4sym
An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm
[chapter]
2005
Lecture Notes in Computer Science
This paper studies an actor-critic type algorithm utilizing the RLS(recursive least-squares) method, which is one of the most efficient techniques for adaptive signal processing, together with natural ...
In the actor part of the studied algorithm, we follow the strategy of performing parameter update via the natural gradient method, while in its update for the critic part, the recursive least-squares method ...
gradient and the recursive least-squares method. ...
doi:10.1007/11596448_9
fatcat:fgt6o5ib5nfbxbewijp2p64lta
Natural actor–critic algorithms
2009
Automatica
We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. ...
Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. ...
Convergence Analysis for Algorithm 3 As stated previously, the main idea in this algorithm is to minimize the least squares error in estimating the advantage function via function approximation. ...
doi:10.1016/j.automatica.2009.07.008
fatcat:svrhbszdwngdpa5bnx6okxwdy4
Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic Approach
2020
Applied Sciences
The Temporal Difference (TD) advantage actor critic algorithm is used with the appropriate reward function. ...
Least Squares method (RLS). ...
TD Advantage Actor Critic Algorithm The TD advantage actor critic algorithm is an online model-free algorithm that consists of two neural networks, the actor network π θ (s) and the critic network V U ...
doi:10.3390/app10228060
fatcat:t3a2sa527ffonophc2u3y5f5ye
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
2016
Machine Learning
We then devise actor-critic algorithms that operate on three timescales-a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers ...
We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application. ...
actor-critic algorithm. ...
doi:10.1007/s10994-016-5569-5
fatcat:famcrbft2jalxjyqsyg5h5xtfa
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
[article]
2015
arXiv
pre-print
We then devise actor-critic algorithms that operate on three timescales - a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers ...
We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application. ...
[59] showed the advantages of using these features in approximating the action-value function in actor-critic algorithms. ...
arXiv:1403.6530v2
fatcat:pnmnlbabqzhk3k74k7h5uj5mp4
A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
2012
IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)
into many actor-critic algorithms in the past few years. ...
Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. ...
The episodic Natural Actor-Critic method in [16] applied to an anthropomorphic robot arm performing a baseball bat swing task. using a recursive least-squares natural actor-critic method in [60] . ...
doi:10.1109/tsmcc.2012.2218595
fatcat:4ecxuz34urddhktvafndxrncgq
Efficient Reinforcement Learning Using Recursive Least-Squares Methods
2002
The Journal of Artificial Intelligence Research
The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. ...
The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method. ...
Littman for their insights and constructive criticisms, which have helped improve the paper significantly. ...
doi:10.1613/jair.946
fatcat:noehlqv3pje3tf2jjhgya6vdwu
Adaptive optimal control for continuous-time linear systems based on policy iteration
2009
Automatica
In this paper we propose a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge ...
The effectiveness of the algorithm is shown while finding the optimal-load-frequency controller for a power system. ...
Alternatively, the solution given by (22) can be obtained also using recursive estimation algorithms (e.g. gradient descent algorithms or the Recursive Least Squares algorithm) in which case a persistence ...
doi:10.1016/j.automatica.2008.08.017
fatcat:fygohn5u4ncujceiharry6lbpu
Revisiting Natural Actor-Critics with Value Function Approximation
[chapter]
2010
Lecture Notes in Computer Science
Consequently, derivation of actor-critic algorithms is not straightforward. ...
Consequently, new forms of critics can easily be integrated in the actor-critic framework. ...
Actually, [5] introduced a natural actor-critic based on the Least-Squares Temporal Differences (LSTD) algorithm of [13] . ...
doi:10.1007/978-3-642-16292-3_21
fatcat:q3kdinyxfrf7fdonjv4ov3lyqi
Convergence Results for Some Temporal Difference Methods Based on Least Squares
2009
IEEE Transactions on Automatic Control
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE( ). ...
Actor-critic algorithms are two-time-scale SA algorithms in which the actor part refers to stochastic gradient descent iterations on the space of policy parameters at the slow time-scale, while the critic ...
The two algorithms are called the least squares temporal difference algorithm, LSTD( ), first proposed by Bradtke and Barto [10] for and generalized by Boyan [11] to , and the least squares policy ...
doi:10.1109/tac.2009.2022097
fatcat:nkndl35ihjaqjkzsdrnjedbpqi
TD-regularized actor-critic methods
2019
Machine Learning
The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. ...
This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. ...
In both algorithms, both the actor and the critic gradients are optimized by ADAM (Kingma and Ba 2014) . For the policy update, both the advantage and the TD error estimates are standardized. ...
doi:10.1007/s10994-019-05788-0
fatcat:osifv5utpnft5kjlmh2xfnxktu
Multi-agent Natural Actor-critic Reinforcement Learning Algorithms
[article]
2022
arXiv
pre-print
Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. ...
Using this, we theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) ...
We can tune w i in such a way that the estimate of least squared error in linear function approximation of advantage function is minimized, i.e., E π θ (w i ) = 1 2 s∈S,a i ∈A i dθ (s, a i )[w i ψ i (s ...
arXiv:2109.01654v3
fatcat:daq6vwi7nvg3dd7lnhup6vgc5q
Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
2009
Neural Networks
The algorithm is based on a reinforcement learning scheme, namely Policy Iterations, and makes use of neural networks, in an Actor/Critic structure, to parametrically represent the control policy and the ...
The algorithm converges online to the optimal control solution without knowledge of the internal system dynamics. Closed-loop dynamic stability is guaranteed throughout. ...
One should note that the least-squares method for finding the parameters of the cost function can be replaced with any other suitable, recursive or not recursive, method of parameter identification. ...
doi:10.1016/j.neunet.2009.03.008
pmid:19362449
fatcat:6w4v2h4idbf6tcgnbun7byqybq
Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning
액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어
2009
Journal of Korean institute of intelligent systems
액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어
algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy. ...
In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory ...
본 논문에서는 연속적인 고차원 공간에 강화학습을 적용 하는 것을 목표로 하여 액터-크리틱 방법(actor-critic method), 퍼지 이론, RLS(recursive least-squares) 필터 등을 종합적으로 사용하는 방법론을 고려한다. ...
doi:10.5391/jkiis.2009.19.4.519
fatcat:kjm675hmrbh35iejbyhlofxuqq
« Previous
Showing results 1 — 15 out of 3,748 results