Filters








4,638 Hits in 5.0 sec

Neural Network Reinforcement Learning for Walking Control of a 3-Link Biped Robot

Ahmad Ghanbari, Yasaman Vaghei, Sayyed Mohammad Reza Sayyed Noorani
2015 International Journal of Engineering and Technology  
The adaptive control agent consists of two neural network units, known as actor and critic for learning prediction and learning control tasks.  ...  Reinforcement Learning (RL) is one of these major techniques, which has been widely used in robot control approaches.  ...  The proposed controller is an actor-critic reinforcement learning unit, in which the actor and the critic are two 3layered feed forward neural networks with variable network weights.  ... 
doi:10.7763/ijet.2015.v7.835 fatcat:txj2ogwyzve5hd2okwks3bn3gu

Actor-critic neural network reinforcement learning for walking control of a 5-link bipedal robot

Yasaman Vaghei, Ahmad Ghanbari, Sayyed Mohammad Reza Sayyed Noorani
2014 2014 Second RSI/ISM International Conference on Robotics and Mechatronics (ICRoM)  
Moreover, since the neural networks are implemented in both of the actor and the critic sections, we have added a learning database to reduce the probability of inaccurate approximation of the nonlinear  ...  Our control agent consists of two three-layered neural network units, known as the critic and the actor for learning prediction and learning control tasks.  ...  This controller is an actor-critic reinforcement learning unit, in which the actor and the critic are two threelayered feed forward neural networks.  ... 
doi:10.1109/icrom.2014.6990997 fatcat:i35kwwhqlrfvbd6pgv2cpm2lbu

Wavelet Neural Network Observer Based Adaptive Tracking Control for a Class of Uncertain Nonlinear Delayed Systems Using Reinforcement Learning

Manish Sharma, Ajay Verma
2012 International Journal of Intelligent Systems and Applications  
This paper is concerned with the observer designing problem for a class of uncertain delayed nonlinear systems using reinforcement learning.  ...  The "strategic" utility function is approximated by the critic WNN and is minimized by the action WNN. Adaptation laws are developed for the online tuning of wavelets parameters.  ...  CONCLUSION Figure 3 . 3 Actor Critic Architecture Adaptive Tracking Control for a Class of Uncertain Nonlinear Delayed Systems Using Reinforcement Learning Copyright © 2012 MECS I.J.  ... 
doi:10.5815/ijisa.2012.02.03 fatcat:vlrywboo6vfm5ku7qtgr55oqse

Supplementary document for Deep Reinforcement Learning Control of White-Light Continuum Generation - 5026263.pdf

Carlo Valensise, Alessandro Giuseppi, Giulio Cerullo, Dario Polli
2021 figshare.com  
ACTOR-CRITIC DEEP REINFORCEMENT LEARNING Reinforcement Learning (RL) is a model-free control methodology that aims at controlling a dynamical system of the form s t+1 = h(s t , a t ), s t ∈ S, a t ∈ A,  ...  In this way, actor and critic NNs can be trained with relevant data for WLC generation.  ... 
doi:10.6084/m9.figshare.13611416.v1 fatcat:3dhpz2wctzgfjic527po2yxyy4

Supplementary document for Deep Reinforcement Learning Control of White-Light Continuum Generation - 5026263.pdf

Carlo Valensise, Alessandro Giuseppi, Giulio Cerullo, Dario Polli
2021 figshare.com  
ACTOR-CRITIC DEEP REINFORCEMENT LEARNING Reinforcement Learning (RL) is a model-free control methodology that aims at controlling a dynamical system of the form s t+1 = h(s t , a t ), s t ∈ S, a t ∈ A,  ...  In this way, actor and critic NNs can be trained with relevant data for WLC generation.  ... 
doi:10.6084/m9.figshare.13611416.v2 fatcat:f5utxiybmrevxm375r7qp4bb5i

Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems

Anthony Siming Chen, Guido Herrmann
2019 2019 IEEE 58th Conference on Decision and Control (CDC)  
Adaptive critic for Q-function approximation For the nonlinear affine system (1) with the Q-function (25), we approximate the Q-function using a critic neural network by Q(x, u) = W T Φ(x, u) + ε Q (x,  ...  The method is termed as integral reinforcement learning (IRL) [8] which employs two neural networks in a critic/actor configuration.  ... 
doi:10.1109/cdc40024.2019.9030116 dblp:conf/cdc/ChenH19 fatcat:fqukxgetbfcqxb4ficixn4oju4

Issues on Stability of ADP Feedback Controllers for Dynamical Systems

S.N. Balakrishnan, Jie Ding, F.L. Lewis
2008 IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)  
Index Terms-Adaptive/approximate dynamic programming (ADP), feedback controllers, neural networks (NNs), nonlinear control, stability.  ...  Different versions of NN structures in the literature, which embed mathematical mappings related to solutions of the ADP-formulated problems called "adaptive critics" or "action-critic" networks, are discussed  ...  For continuous state and action spaces, convergence results are more challenging as adaptive critics require the use of nonlinear function approximators.  ... 
doi:10.1109/tsmcb.2008.926599 pmid:18632377 fatcat:z55umjlzpjgd7eik2kfhapfqfy

Adaptive Pid Controller Based On Reinforcement Learning For Wind Turbine Control

M. Sedighizadeh, A. Rezazadeh
2008 Zenodo  
In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network is used to approximate the policy function of Actor and the value function of Critic simultaneously  ...  Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively.  ...  Actor-Critic Learning based on RBF Network The RBF network is a kind of multi-layer feed forward neural network.  ... 
doi:10.5281/zenodo.1057789 fatcat:6ij65icq6fblba5hqsfa7k777u

Reinforcement learning and adaptive dynamic programming for feedback control

Frank L. Lewis, Draguna Vrabie
2009 IEEE Circuits and Systems Magazine  
One class of reinforcement learning methods is based on the Actor-Critic structure [Barto, Sutton, Anderson 1983] , where an actor component applies an action or control policy to the environment, and  ...  Therefore, it is of interest to study reinforcement learning systems having an actor-critic structure wherein the critic assesses the value of current policies based on some sort of optimality criteria  ...  The resulting structure for reinforcement Q learning is the same as the actor-critic system shown in Figure 2 .  ... 
doi:10.1109/mcas.2009.933854 fatcat:qldyoe4lizbgfj55nthjwyujpy

Learning-based Hamilton-Jacobi-Bellman Methods for Optimal Control [article]

Sixiong You, Ran Dai, Ping Lu
2019 arXiv   pre-print
However, when validated solutions of TPBVPs are not available, the reinforcement learning method is applied to solve HJB by constructing a neural network, defining a reward function, and setting appropriate  ...  After obtaining a trained neural network from supervised learning, we are able to find proper initial adjoint variables for given boundary conditions in real-time.  ...  In each level, there are two neural networks, actor network and critic network.  ... 
arXiv:1907.10097v1 fatcat:ymws4w7ma5f27baajjrtyrnjey

Reinforcement learning with via-point representation

Hiroyuki Miyamoto, Jun Morimoto, Kenji Doya, Mitsuo Kawato
2004 Neural Networks  
In this paper, we propose a new learning framework for motor control. This framework consists of two components: reinforcement learning and via-point representation.  ...  In the field of motor control, conventional reinforcement learning has been used to acquire control sequences such as cart-pole or stand-up robot control.  ...  Relationship between the keep time at the inverted position and the trial number with the conventional actor-critic framework. t up denotes the time in which the pole stayed up ðcosðuÞ . cosðp=4ÞÞ: The  ... 
doi:10.1016/j.neunet.2003.11.004 pmid:15037348 fatcat:ax2o2aupuvcqtg53g62qkzvnwu

Risk Conditioned Neural Motion Planning [article]

Xin Huang, Meng Feng, Ashkan Jasour, Guy Rosman, Brian Williams
2021 arXiv   pre-print
Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators.  ...  Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks.  ...  Soft Actor Critic Soft Actor Critic (SAC) [24] is an off-policy actor critic deep reinforcement learning algorithm based on max entropy reinforcement learning.  ... 
arXiv:2108.01851v1 fatcat:7iocv7sss5fkbasymyrcnhhmdi

Toward Reliable Designs of Data-Driven Reinforcement Learning Tracking Control for Euler-Lagrange Systems [article]

Zhikai Yao, Jennie Si, Ruofan Wu, Jianyong Yao
2021 arXiv   pre-print
We provide a theoretical guarantee for the stability of the overall dynamic system, weight convergence of the approximating nonlinear neural networks, and the Bellman (sub)optimality of the resulted control  ...  We develop this work based on an established direct heuristic dynamic programming (dHDP) learning paradigm to perform online learning and adaptation and a backstepping design for a class of important nonlinear  ...  Hyperbolic tangent is used as the transfer function in the actor-critic networks to approximate the control policy and the cost-to-go function. 1) Critic Neural Network: The critic neural network (CNN)  ... 
arXiv:2101.00068v2 fatcat:3l25h6nnxvbo5irnglb7irbtvq

Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems

Sholeh Yasini, Ali Karimpour, Mohammad-Bagher Naghibi Sistani, Hamidreza Modares
2014 International Journal of Adaptive Control and Signal Processing  
Novel update laws are derived for adaptation of the critic and actor NN weights.  ...  The proposed algorithm is implemented on actorcritic-disturbance NN approximator structure to obtain the solution of the Hamilton-Jacobi-Isaacs equation online forward in time.  ...  to solve in case of nonlinear systems.  ... 
doi:10.1002/acs.2485 fatcat:7g45wdvzcbcsrhbo6coqdmbpqa

Reinforcement learning and optimal adaptive control: An overview and implementation examples

Said G. Khan, Guido Herrmann, Frank L. Lewis, Tony Pipe, Chris Melhuish
2012 Annual Reviews in Control  
The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint  ...  Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals.  ...  They have used two NNs (which is the case in most adaptive critic/actor-critic structures), one for the critic and one for the actor, approximating the policy.  ... 
doi:10.1016/j.arcontrol.2012.03.004 fatcat:etqao7m4efccdkh66zhv7piphu
« Previous Showing results 1 — 15 out of 4,638 results