Filters








11,545 Hits in 3.7 sec

A Self-Tuning Actor-Critic Algorithm [article]

Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
2021 arXiv   pre-print
We apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning  ...  Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.  ...  We now introduce the Self-Tuning Actor-Critic (STAC) agent.  ... 
arXiv:2002.12928v5 fatcat:hb2n3p6gc5cz7bncgl4c3ck43a

Self-Tuning Two Degree-of-Freedom Proportional–Integral Control System Based on Reinforcement Learning for aMultiple-Input Multiple-Output Industrial Process That Suffers from Spatial Input Coupling

Fumitake Fujii, Akinori Kaneishi, Takafumi Nii, Ryu'ichiro Maenishi, Soma Tanaka
2021 Processes  
Theoretically, the self-tuning functionality of the proposed control system is based on the actor-critic reinforcement learning algorithm.  ...  We specifically target a thin film production process as an example of such a MIMO process and propose a self-tuning two-degree-of-freedom PI controller for the film thickness control problem.  ...  We use the actor-critic RL algorithm to synthesize the self-tuning two-degree-offreedom PI control system. We introduced RBF networks to both the actor and critic independently.  ... 
doi:10.3390/pr9030487 fatcat:cxb3dopamnbjvk54qcw4l2dvnq

Deep Reinforced Self-Attention Masks for Abstractive Summarization (DR.SAS) [article]

Ankit Chadha, Mohamed Masoud
2019 arXiv   pre-print
We propose DR.SAS which applies the Actor-Critic (AC) algorithm to learn a dynamic self-attention distribution over the tokens to reduce redundancy and generate factual and coherent summaries to improve  ...  After performing hyperparameter tuning, we achievedbetter ROUGE results compared to the baseline.  ...  Actor-Critic Method Actor-critic methods are popular deep reinforcement learning algorithms and provide an effective variation of the family of Monte Carlo Policy Gradients (REIN-FORCE).The policy gradient  ... 
arXiv:2001.00009v1 fatcat:v3jnpaybzbdxplyq5zf7wmpipq

Towards Automatic Actor-Critic Solutions to Continuous Control [article]

Jake Grigsby, Jin Yong Yoo, Yanjun Qi
2021 arXiv   pre-print
This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft Actor-Critic algorithm.  ...  Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks.  ...  This paper looks to automate the process of tuning an actor-critic algorithm and creates an out-ofthe-box solution to dense-reward continuous control problems.  ... 
arXiv:2106.08918v2 fatcat:2hy6rrfmoffx3be5xsdgr3krjq

A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots [article]

Xinyi Yu and Siyu Xu and Yuehai Fan and Linlin Ou
2021 arXiv   pre-print
To solve the coupling problem of control loops and the adaptive parameter tuning problem in the multi-input multi-output (MIMO) PID control system, a self-adaptive LSAC-PID algorithm is proposed based  ...  Then, to improve the convergence speed of RL and the stability of mobile robots, a Lyapunov-based reward shaping soft actor-critic (LSAC) algorithm is proposed based on Lyapunov theory and potential-based  ...  Haarnoja [17] proposed an off-policy maximum entropy actor-critic algorithm with a stochastic actor, that is soft actor-critic (SAC), which ensures effective learning of samples and system stability.  ... 
arXiv:2111.02283v1 fatcat:o5277qgnkjd7hi4axaqlir4ilm

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward [article]

S.A. Murphy, Y. Deng, E.B. Laber, H.R. Maei, R.S. Sutton, K. Witkiewitz
2016 arXiv   pre-print
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals.  ...  This algorithm is developed with a view towards its use in mobile health.  ...  The actor algorithm is represented by the maximization steps in the batch, off-policy, actor-critic algorithm given in Algorithm 2.  ... 
arXiv:1607.05047v1 fatcat:vc2csijpk5eozgx3hj3jsvg2wy

Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia [chapter]

Mehdi Khamassi, Louis-Emmanuel Martinet, Agnès Guillot
2006 Lecture Notes in Computer Science  
In a reward-seeking task performed in a continuous environment, our previous work compared several Actor-Critic (AC) architectures implementing dopamine-like reinforcement learning mechanisms in the rat's  ...  They lead to good performances, even if they are still weaker than our hand-tuned task decomposition and than the best Kohonen maps that we got.  ...  Discussion In this work, we have combined three different self-organizing maps with a mixture of Actor-Critic experts.  ... 
doi:10.1007/11840541_33 fatcat:5urdqlvwsjhofhpfcltu4cgraq

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Ivo Grondman, Lucian Busoniu, Robert Babuska
2012 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)  
In the literature, modelbased actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR).  ...  Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator.  ...  Model Learning Actor-Critic (MLAC) learns a process model and employs it to update the actor.  ... 
doi:10.1109/cdc.2012.6426427 dblp:conf/cdc/GrondmanBB12 fatcat:iihtxvfeg5bdfg6c4bsvksf4jm

Intelligent Control Strategy for Transient Response of a Variable Geometry Turbocharger System Based on Deep Reinforcement Learning

Hu, Yang, Li, Li, Bai
2019 Processes  
In addition, the proposed strategy is able to adapt to the changing environment and hardware aging over time by adaptively tuning the algorithm in a self-learning manner on-line, making it attractive to  ...  Using a fine-tuned proportion integration differentiation (PID) controller as a benchmark, the results show that the control performance based on the proposed DDPG algorithm can achieve a good transient  ...  DDPG algorithm: Randomly initialize critic network (s, a| ) and actor (s| ) with weights and .  ... 
doi:10.3390/pr7090601 fatcat:55x4mcpuuvchff7odly3x72zyi

Self-Adaptive Double Bootstrapped DDPG

Zhuobin Zheng, Chun Yuan, Zhihui Lin, Yangyang Cheng, Hanghao Wu
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
In this work, we propose Self-Adaptive Double Bootstrapped DDPG (SOUP), an algorithm that extends DDPG to bootstrapped actor-critic architecture.  ...  To alleviate the instability, a self-adaptive confidence mechanism is introduced to dynamically adjust the weights of bootstrapped heads and enhance the ensemble performance effectively and efficiently  ...  Our approach is built upon the DDPG algorithm [Lillicrap et al., 2016] , a model-free, off-policy actor-critic [Konda and Tsitsiklis, 2000] approach consisting of a Q-function (the critic) and a policy  ... 
doi:10.24963/ijcai.2018/444 dblp:conf/ijcai/ZhengYLCW18 fatcat:obfypk474reabohfw6lldt54ue

A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots [article]

Xinyi Yu, Yuehai Fan, Siyu Xu, Linlin Ou
2021 arXiv   pre-print
A new hierarchical structure is developed, which includes the upper controller based on soft actor-critic (SAC), one of the most competitive continuous control algorithms, and the lower controller based  ...  To tackle these problems, we propose a self-adaptive model-free SAC-PID control approach based on reinforcement learning for automatic control of mobile robots.  ...  In order to solve the above problems, Haarnoja 31 proposed an off-policy maximum entropy actor-critic algorithm with a stochastic actor, that is soft actor-critic (SAC), which ensures the effective learning  ... 
arXiv:2103.10686v1 fatcat:yuvleg37kbb65prncgofdx3gpu

Efficient Sequence Labeling with Actor-Critic Training [article]

Saeed Najafi, Colin Cherry, Grzegorz Kondrak
2018 arXiv   pre-print
We frame the prediction of the output sequence as a sequential decision-making process, where we train the network with an adjusted actor-critic algorithm (AC-RNN).  ...  We also show that our training strategy is significantly better than other techniques for addressing RNN's exposure bias, such as Scheduled Sampling, and Self-Critical policy training.  ...  Self-Critical Comparisons: In our final experiment, we compare the adjusted actor-critic training with the Self-Critical policy training of Rennie et al.  ... 
arXiv:1810.00428v1 fatcat:cbww4tqra5fp3kmjl3tgmxx5bq

Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm

ZhiBin Zhang, XinHong Li, JiPing An, WanXin Man, GuoHui Zhang, Marco Pizzarelli
2020 International Journal of Aerospace Engineering  
Considering the continuity of state space and action space, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm based on actor-critic architecture is adopted.  ...  The proposed PID-Guide TD3 algorithm has faster training speed and higher stability than the TD3 algorithm.  ...  TD3 uses a total of 6 neural networks, namely, actor network π ϕ , actor target network π ϕ ′ , critic network Q θ 1 , critic network Q θ 2 , critic target network Q θ 1 ′ , and critic target network Q  ... 
doi:10.1155/2020/8874619 fatcat:6o6d5d7olrezziixbeuubdy44e

Efficient Sequence Labeling with Actor-Critic Training [chapter]

Saeed Najafi, Colin Cherry, Grzegorz Kondrak
2019 Lecture Notes in Computer Science  
We frame the prediction of the output sequence as a sequential decision-making process, where we train the network with an adjusted actor-critic algorithm (AC-RNN).  ...  We also show that our training strategy is significantly better than other techniques for addressing RNN's exposure bias, such as Scheduled Sampling, and Self-Critical policy training.  ...  Self-Critical Comparisons: In our final experiment, we compare the adjusted actor-critic training with the Self-Critical policy training of Rennie et al. (2017) which does not require a critic model.  ... 
doi:10.1007/978-3-030-18305-9_46 fatcat:qisc2mmsyzhehp6sz333fitnpy

Dual Behavior Regularized Reinforcement Learning [article]

Chapman Siu, Jason Traish, Richard Yi Da Xu
2021 arXiv   pre-print
We demonstrate this new algorithm can outperform several strong baseline models in different contexts based on a range of continuous environments.  ...  However, many of these approaches presume optimal or near optimal experiences or the presence of a consistent environment.  ...  The advantage weighted actor-critic (AWAC) algorithm, trains an off-policy critic and an actor with an implicit policy constraint without the use of a behavior policy in the offline reinforcement learning  ... 
arXiv:2109.09037v1 fatcat:wou5srspaje4lc77ug5rrt3upi
« Previous Showing results 1 — 15 out of 11,545 results