A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Neural Network Reinforcement Learning for Walking Control of a 3-Link Biped Robot
2015
International Journal of Engineering and Technology
The adaptive control agent consists of two neural network units, known as actor and critic for learning prediction and learning control tasks. ...
Reinforcement Learning (RL) is one of these major techniques, which has been widely used in robot control approaches. ...
The proposed controller is an actor-critic reinforcement learning unit, in which the actor and the critic are two 3layered feed forward neural networks with variable network weights. ...
doi:10.7763/ijet.2015.v7.835
fatcat:txj2ogwyzve5hd2okwks3bn3gu
Actor-critic neural network reinforcement learning for walking control of a 5-link bipedal robot
2014
2014 Second RSI/ISM International Conference on Robotics and Mechatronics (ICRoM)
Moreover, since the neural networks are implemented in both of the actor and the critic sections, we have added a learning database to reduce the probability of inaccurate approximation of the nonlinear ...
Our control agent consists of two three-layered neural network units, known as the critic and the actor for learning prediction and learning control tasks. ...
This controller is an actor-critic reinforcement learning unit, in which the actor and the critic are two threelayered feed forward neural networks. ...
doi:10.1109/icrom.2014.6990997
fatcat:i35kwwhqlrfvbd6pgv2cpm2lbu
Wavelet Neural Network Observer Based Adaptive Tracking Control for a Class of Uncertain Nonlinear Delayed Systems Using Reinforcement Learning
2012
International Journal of Intelligent Systems and Applications
This paper is concerned with the observer designing problem for a class of uncertain delayed nonlinear systems using reinforcement learning. ...
The "strategic" utility function is approximated by the critic WNN and is minimized by the action WNN. Adaptation laws are developed for the online tuning of wavelets parameters. ...
CONCLUSION
Figure 3 . 3 Actor Critic Architecture
Adaptive Tracking Control for a Class of Uncertain Nonlinear Delayed Systems Using Reinforcement Learning Copyright © 2012 MECS
I.J. ...
doi:10.5815/ijisa.2012.02.03
fatcat:vlrywboo6vfm5ku7qtgr55oqse
Supplementary document for Deep Reinforcement Learning Control of White-Light Continuum Generation - 5026263.pdf
2021
figshare.com
ACTOR-CRITIC DEEP REINFORCEMENT LEARNING Reinforcement Learning (RL) is a model-free control methodology that aims at controlling a dynamical system of the form s t+1 = h(s t , a t ), s t ∈ S, a t ∈ A, ...
In this way, actor and critic NNs can be trained with relevant data for WLC generation. ...
doi:10.6084/m9.figshare.13611416.v1
fatcat:3dhpz2wctzgfjic527po2yxyy4
Supplementary document for Deep Reinforcement Learning Control of White-Light Continuum Generation - 5026263.pdf
2021
figshare.com
ACTOR-CRITIC DEEP REINFORCEMENT LEARNING Reinforcement Learning (RL) is a model-free control methodology that aims at controlling a dynamical system of the form s t+1 = h(s t , a t ), s t ∈ S, a t ∈ A, ...
In this way, actor and critic NNs can be trained with relevant data for WLC generation. ...
doi:10.6084/m9.figshare.13611416.v2
fatcat:f5utxiybmrevxm375r7qp4bb5i
Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems
2019
2019 IEEE 58th Conference on Decision and Control (CDC)
Adaptive critic for Q-function approximation For the nonlinear affine system (1) with the Q-function (25), we approximate the Q-function using a critic neural network by Q(x, u) = W T Φ(x, u) + ε Q (x, ...
The method is termed as integral reinforcement learning (IRL) [8] which employs two neural networks in a critic/actor configuration. ...
doi:10.1109/cdc40024.2019.9030116
dblp:conf/cdc/ChenH19
fatcat:fqukxgetbfcqxb4ficixn4oju4
Issues on Stability of ADP Feedback Controllers for Dynamical Systems
2008
IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)
Index Terms-Adaptive/approximate dynamic programming (ADP), feedback controllers, neural networks (NNs), nonlinear control, stability. ...
Different versions of NN structures in the literature, which embed mathematical mappings related to solutions of the ADP-formulated problems called "adaptive critics" or "action-critic" networks, are discussed ...
For continuous state and action spaces, convergence results are more challenging as adaptive critics require the use of nonlinear function approximators. ...
doi:10.1109/tsmcb.2008.926599
pmid:18632377
fatcat:z55umjlzpjgd7eik2kfhapfqfy
Adaptive Pid Controller Based On Reinforcement Learning For Wind Turbine Control
2008
Zenodo
In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network is used to approximate the policy function of Actor and the value function of Critic simultaneously ...
Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. ...
Actor-Critic Learning based on RBF Network The RBF network is a kind of multi-layer feed forward neural network. ...
doi:10.5281/zenodo.1057789
fatcat:6ij65icq6fblba5hqsfa7k777u
Reinforcement learning and adaptive dynamic programming for feedback control
2009
IEEE Circuits and Systems Magazine
One class of reinforcement learning methods is based on the Actor-Critic structure [Barto, Sutton, Anderson 1983] , where an actor component applies an action or control policy to the environment, and ...
Therefore, it is of interest to study reinforcement learning systems having an actor-critic structure wherein the critic assesses the value of current policies based on some sort of optimality criteria ...
The resulting structure for reinforcement Q learning is the same as the actor-critic system shown in Figure 2 . ...
doi:10.1109/mcas.2009.933854
fatcat:qldyoe4lizbgfj55nthjwyujpy
Learning-based Hamilton-Jacobi-Bellman Methods for Optimal Control
[article]
2019
arXiv
pre-print
However, when validated solutions of TPBVPs are not available, the reinforcement learning method is applied to solve HJB by constructing a neural network, defining a reward function, and setting appropriate ...
After obtaining a trained neural network from supervised learning, we are able to find proper initial adjoint variables for given boundary conditions in real-time. ...
In each level, there are two neural networks, actor network and critic network. ...
arXiv:1907.10097v1
fatcat:ymws4w7ma5f27baajjrtyrnjey
Reinforcement learning with via-point representation
2004
Neural Networks
In this paper, we propose a new learning framework for motor control. This framework consists of two components: reinforcement learning and via-point representation. ...
In the field of motor control, conventional reinforcement learning has been used to acquire control sequences such as cart-pole or stand-up robot control. ...
Relationship between the keep time at the inverted position and the trial number with the conventional actor-critic framework. t up denotes the time in which the pole stayed up ðcosðuÞ . cosðp=4ÞÞ: The ...
doi:10.1016/j.neunet.2003.11.004
pmid:15037348
fatcat:ax2o2aupuvcqtg53g62qkzvnwu
Risk Conditioned Neural Motion Planning
[article]
2021
arXiv
pre-print
Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators. ...
Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. ...
Soft Actor Critic Soft Actor Critic (SAC) [24] is an off-policy actor critic deep reinforcement learning algorithm based on max entropy reinforcement learning. ...
arXiv:2108.01851v1
fatcat:7iocv7sss5fkbasymyrcnhhmdi
Toward Reliable Designs of Data-Driven Reinforcement Learning Tracking Control for Euler-Lagrange Systems
[article]
2021
arXiv
pre-print
We provide a theoretical guarantee for the stability of the overall dynamic system, weight convergence of the approximating nonlinear neural networks, and the Bellman (sub)optimality of the resulted control ...
We develop this work based on an established direct heuristic dynamic programming (dHDP) learning paradigm to perform online learning and adaptation and a backstepping design for a class of important nonlinear ...
Hyperbolic tangent is used as the transfer function in the actor-critic networks to approximate the control policy and the cost-to-go function. 1) Critic Neural Network: The critic neural network (CNN) ...
arXiv:2101.00068v2
fatcat:3l25h6nnxvbo5irnglb7irbtvq
Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems
2014
International Journal of Adaptive Control and Signal Processing
Novel update laws are derived for adaptation of the critic and actor NN weights. ...
The proposed algorithm is implemented on actorcritic-disturbance NN approximator structure to obtain the solution of the Hamilton-Jacobi-Isaacs equation online forward in time. ...
to solve in case of nonlinear systems. ...
doi:10.1002/acs.2485
fatcat:7g45wdvzcbcsrhbo6coqdmbpqa
Reinforcement learning and optimal adaptive control: An overview and implementation examples
2012
Annual Reviews in Control
The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint ...
Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals. ...
They have used two NNs (which is the case in most adaptive critic/actor-critic structures), one for the critic and one for the actor, approximating the policy. ...
doi:10.1016/j.arcontrol.2012.03.004
fatcat:etqao7m4efccdkh66zhv7piphu
« Previous
Showing results 1 — 15 out of 4,638 results