A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Model learning actor-critic algorithms: Performance evaluation in a motion control task
2012
2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
In the literature, modelbased actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). ...
Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. ...
Model Learning Actor-Critic In addition to learning the actor and critic functions, the Model Learning Actor-Critic (MLAC) method learns an approximate process model x =f ζ (x, u). ...
doi:10.1109/cdc.2012.6426427
dblp:conf/cdc/GrondmanBB12
fatcat:iihtxvfeg5bdfg6c4bsvksf4jm
Actor-Critic Control with Reference Model Learning
2011
IFAC Proceedings Volumes
We propose a new actor-critic algorithm for reinforcement learning. ...
The algorithm does not use an explicit actor, but learns a reference model which represents a desired behaviour, along which the process is to be controlled by using the inverse of a learned process model ...
EXAMPLE: PENDULUM SWINGUP To evaluate and compare the performance of our algorithm, we apply it to the task of learning to swing up a simulated inverted pendulum and compare it to the standard algorithm ...
doi:10.3182/20110828-6-it-1002.00759
fatcat:b6wr37lrm5cnznr7fic3b46pcm
On the Role of Models in Learning Control: Actor-Critic Iterative Learning Control
[article]
2020
arXiv
pre-print
These basis functions encode implicit model knowledge and the actor-critic algorithm learns the feedforward parameters without explicitly using a model. ...
The developed actor-critic iterative learning control (ACILC) framework uses a feedforward parameterization with basis functions. ...
Actor-critic learning The actor-critic algorithm for the solution of the optimal control problem is presented here. ...
arXiv:2007.00430v2
fatcat:4fwlzluvkjh4xbwse5oz4k3jta
Model-based Actor-critic Learning of Robotic Impedance Control in Complex Interactive Environment
2021
IEEE transactions on industrial electronics (1982. Print)
To learn the interactive skill, a modelbased actor-critic learning algorithm and a safety-learning strategy are proposed in this paper to find the optimal impedance control, in which the learning process ...
The effectiveness of the learning algorithm and the performance of the learned impedance control are validated in a UR5 robot. ...
To realize RL of robotic impedance control, a model-based actor-critic (AC) learning algorithm and a safety-learning strategy are proposed in this paper. ...
doi:10.1109/tie.2021.3134082
fatcat:twbmcfnvi5aw5kamnatsc7hwfy
Reinforcement Learning Algorithms in Humanoid Robotics
[chapter]
2007
Humanoid Robots: New Developments
Finally, one ought to point out that the problem of motion of humanoid robots is a very complex control task, especially when the real environment is taken into account, requiring as a minimum, its integration ...
All the mentioned characteristics have to be taken into account in the synthesis of advanced control algorithms that accomplish stable, fast and reliable performance of humanoid robots. ...
Acknowledgments The work described in this conducted was conducted within the national research project "Dynamic and Control of High-Performance Humanoid Robots: Theory and Application". and was funded ...
doi:10.5772/4878
fatcat:brp43tnj35c2lmalaawtv5gv6u
UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
2021
Journal of Artificial Intelligence and Technology
The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental ...
Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. ...
, which makes
TD3 algorithm perform better than DDPG in
many continuous control tasks. ...
doi:10.37965/jait.2021.12003
fatcat:nq3r65srfze43pnfias4oiri4u
Motion Control of Unmanned Underwater Vehicles Via Deep Imitation Reinforcement Learning Algorithm
2020
IET Intelligent Transport Systems
In this study, a motion control algorithm based on deep imitation reinforcement learning is proposed for the unmanned underwater vehicles (UUVs). ...
The deep reinforcement learning employs actor-critic architecture. The actor part executes the control strategy and the critic part evaluates current control strategy. ...
The motion control of UUV is a continuous control task, so the TD3 based on actor-critic architecture is adopted. ...
doi:10.1049/iet-its.2019.0273
fatcat:3fjyronvanh6zlhkhur3gjg6wq
Soft Actor-Critic Algorithms and Applications
[article]
2019
arXiv
pre-print
Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. ...
In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. ...
Acknowledgments We would like to thank Vitchyr Pong and Haoran Tang for constructive discussions during the development of soft actor-critic, Vincent Vanhoucke for his support towards the project at Google ...
arXiv:1812.05905v2
fatcat:dw4325vci5ezzpfyfrtulkpko4
Smart Train Operation Algorithms based on Expert Knowledge and Reinforcement Learning
[article]
2021
arXiv
pre-print
Compared with previous works, the proposed algorithms can realize the control of continuous action for the subway system and optimize multiple critical objectives without using an offline speed profile ...
operation algorithm are better than expert manual driving and existing ATO algorithms in terms of energy efficiency. ...
These concerns make energy efficiency play a core actor in our control model designing. ...
arXiv:2003.03327v3
fatcat:qa6pddzifjghzmwru3yv5cr2im
A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous Vehicles
[article]
2021
arXiv
pre-print
In this survey, we systematically summarize the current literature on studies that apply reinforcement learning (RL) to the motion planning and control of autonomous vehicles. ...
However, this approach does not automatically guarantee maximal performance due to the lack of a system-level optimization. ...
Actor-Critic algorithm has been successfully applied in both discrete behavior decision makings and continuous motion planning tasks [15] , [16] .
III. ...
arXiv:2105.14218v2
fatcat:27glt4i4lfhg3j4ozjrlsq6i3e
Machine Learning Algorithms in Bipedal Robot Control
2012
IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)
This paper gives a review of recent advances on the stateof-the-art learning algorithms and their applications to bipedal robot control. ...
In the past decades, machine learning techniques, such as supervised learning, reinforcement learning, and unsupervised learning, have been increasingly used in the control engineering community. ...
With a classical control approach, a robot is explicitly programmed to perform the desired task using a complete mathematical model of the robot and its environment. ...
doi:10.1109/tsmcc.2012.2186565
fatcat:tchoesxg6rc2vkuh7gtlxxfosa
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
2020
Sensors
To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. ...
Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement ...
In order to learn a stable control model, a reasonable reward function structure is necessary. ...
doi:10.3390/s20164546
pmid:32823783
fatcat:eqielsplina5xeyenrosylf3cu
Real Time Robot Policy Adaptation Based on Intelligent Algorithms
[chapter]
2011
IFIP Advances in Information and Communication Technology
The proposed algorithm is evaluated in the simulated environment of the Cyber Rodent (CR) robot, where the robot has to increase its energy level by capturing the active battery packs. ...
In this paper we present a new method for robot real time policy adaptation by combining learning and evolution. The robot adapts the policy as the environment conditions change. ...
In order to test the effectiveness of the proposed algorithm, we considered a biologically inspired task for the CR robot ( [13] ). ...
doi:10.1007/978-3-642-23960-1_1
fatcat:4uiz3gxkvnaqrcfh6l5q6yokwe
Reinforcement Learning Under Algorithmic Triage
[article]
2021
arXiv
pre-print
To this end, we look at the problem through the framework of options and develop a two-stage actor-critic method to learn reinforcement learning models under triage. ...
In this work, we take a first step towards developing reinforcement learning models that are optimized to operate under algorithmic triage. ...
In learning under algorithmic triage, one does not only has to find a machine learning model but also a triage policy which determines who decides, the model or the human, and when. ...
arXiv:2109.11328v1
fatcat:suftzvuqtba45pk5yyxmm6zloi
Adaptive Inverse Optimal Control for Rehabilitation Robot Systems Using Actor-Critic Algorithm
2014
Mathematical Problems in Engineering
To this goal, a new adaptive inverse optimal hybrid control (AHC) combining inverse optimal control and actor-critic learning is proposed. ...
Then, based on this model, an open-loop error system is formed; thereafter, an inverse optimal control input is designed to minimize the cost functional and a NN-based actor-critic feedforward signal is ...
Then, according to the evaluation result, we make a suitable training task, as shown in Figure 8 . ...
doi:10.1155/2014/285248
fatcat:rrj4mynikzdebduulpylh5ygrq
« Previous
Showing results 1 — 15 out of 11,119 results