299 Hits in 4.8 sec

Model-Augmented Actor-Critic: Backpropagating through Paths [article]

Ignasi Clavera, Violet Fu, Pieter Abbeel
2020 arXiv   pre-print
Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion.  ...  Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning.  ...  MODEL-AUGMENTED ACTOR-CRITIC OBJECTIVE Among model-free methods, actor-critic methods have shown superior performance in terms of sample efficiency and asymptotic performance (Haarnoja et al., 2018a)  ... 
arXiv:2005.08068v1 fatcat:mss7asjil5ezjkvopw75ihkppq

Stochastic Activation Actor Critic Methods [chapter]

Wenling Shang, Douwe van der Wal, Herke van Hoof, Max Welling
2020 Lecture Notes in Computer Science  
Yet effective and general approaches to include such elements in actor-critic models are still lacking.  ...  Inspired by the aforementioned techniques, we propose an effective way to inject randomness into actor-critic models to improve general exploratory behavior and reflect environment uncertainty.  ...  to discover means to strengthen actor-critic methods with stochastic modeling components.  ... 
doi:10.1007/978-3-030-46133-1_7 fatcat:4byrgso2wjdydpamygyryuc6gu

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [article]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework.  ...  By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior  ...  We will discuss how we can devise a soft actor-critic algorithm through a policy iteration formulation, where we instead evaluate the Q-function of the current policy and update the policy through an off-policy  ... 
arXiv:1801.01290v2 fatcat:5737bv4lmzdzxbv6xreow6phfy

Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models [article]

Chin-Wei Huang, Laurent Dinh, Aaron Courville
2020 arXiv   pre-print
Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling.  ...  In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation  ...  Improving explo- ration in soft-actor-critic with normalizing flows policies. arXiv preprint arXiv:1906.02771, 2019. Wibisono, A., Wilson, A. C., and Jordan, M. I.  ... 
arXiv:2002.07101v1 fatcat:xqhunznulzc23oxiixbkamrx3a

Caching Transient Content for IoT Sensing: Multi-Agent Soft Actor-Critic [article]

Xiongwei Wu, Xiuhua Li, Jun Li, P. C. Ching, Victor C. M. Leung, H. Vincent Poor
2020 arXiv   pre-print
To efficiently handle the exponentially large number of actions, we devise a novel reinforcement learning approach, which is a discrete multi-agent variant of soft actor-critic (SAC).  ...  Specifically, we model the cache update problem as a cooperative multi-agent Markov decision process with the goal of minimizing the long-term average weighted cost.  ...  Proposed Multi-Agent Discrete Soft Actor-Critic Learning SAC is the state-of-the-art RL algorithm, which is as a result of an entropy regularized formalism that augments exploration [30] .  ... 
arXiv:2008.13191v1 fatcat:7bdizygnobbixppkoo4f4azmc4

Loss is its own Reward: Self-Supervision for Reinforcement Learning [article]

Evan Shelhamer, Parsa Mahmoudieh, Max Argus, Trevor Darrell
2017 arXiv   pre-print
To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses.  ...  ML-DDPG (Munk et al., 2016) extends actor-critic with a one-step predictive model of the successor state and reward.  ...  The observation mapping is learned by the first layer of the the model, transferred to the actor-critic network, and then fixed.  ... 
arXiv:1612.07307v2 fatcat:fcctw5rmhbc6bnoqj7lbqyo5ju

A2C: Attention-Augmented Contrastive Learning for State Representation Extraction

Haoqiang Chen, Yadong Liu, Zongtan Zhou, Ming Zhang
2020 Applied Sciences  
Finally an attention-augmented contrastive learning method called A2C is obtained.  ...  The bottom part is the shared Actor-Critic Model. Figure 4 . 4 Figure 4.  ...  Moreover, a shared Actor-Critic model was updated using those training samples.  ... 
doi:10.3390/app10175902 fatcat:rt7bxtqyvbhyznpjhrnxopbn4i

Accelerated Policy Learning with Parallel Differentiable Simulation [article]

Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin
2022 arXiv   pre-print
Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments  ...  In a typical model-based reinforcement learning algorithm, the learned model can be used in two ways: (1) data augmentation, (2) policy gradient estimation with backpropagation through a differentiable  ...  SHORT-HORIZON ACTOR-CRITIC (SHAC) To resolve the aforementioned issues of gradient-based policy learning, we propose the Short-Horizon Actor-Critic method (SHAC).  ... 
arXiv:2204.07137v1 fatcat:7mc3mbb4cnglrh3auxavtvb47m

Supplementary Information: Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement Learning [article]

Jeffrey Ede
2020 Zenodo  
Discounted future loss backpropagation through time 42 46 , and playing score-based computer games 47, 48 .  ...  Our actor, critic and generator are trained together. It follows that generator losses, which our critic learns to predict, decrease throughout training as generator performance improves.  ... 
doi:10.5281/zenodo.4304454 fatcat:xechwbtb7resjmd4dhtw2gcv3y

Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains

E. Mizutani, S.E. Dreyfus
1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227)  
A standard \actor-critic" neural network model has two separate components: the action (actor) network and the value (critic) network.  ...  Due to the nature of neural model-free l e arning, the agent needs many iterations to nd the optimal actions even in small-scale path problems.  ...  Through repeated passes, the actor-critic Elman nets learned the optimal actions as well as the estimated maximum values. Table 3 .  ... 
doi:10.1109/ijcnn.1998.687169 fatcat:r44xdijcynf7jftn6kqpp362nq

Continuous-Time Adaptive Critics

T. Hanselmann, L. Noakes, A. Zaknich
2007 IEEE Transactions on Neural Networks  
Index Terms-Actor-critic adaptation, adaptive critic design (ACD), approximate dynamic programming, backpropagation through time (BPTT), continuous adaptive critic designs, real-time recurrent learning  ...  Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent.  ...  The influence of the indirect path through is lost for and the gradient in this case is the instantaneous gradient used in BPTT .  ... 
doi:10.1109/tnn.2006.889499 pmid:17526332 fatcat:v5msmlbvnvaorifzzihuzmr6fi

Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement Learning [article]

Jeffrey M. Ede
2021 arXiv   pre-print
Source code, pretrained models, and training data is openly accessible at  ...  Thus, we present a prototype for a contiguous sparse scan system that piecewise adapts scan paths to specimens as they are scanned.  ...  Discounted future loss backpropagation through time 44 48 , and playing score-based computer games 49, 50 .  ... 
arXiv:2004.02786v8 fatcat:tcbkzlcqkbgczarb6adguoxzni

A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots [article]

Xinyi Yu, Yuehai Fan, Siyu Xu, Linlin Ou
2021 arXiv   pre-print
A new hierarchical structure is developed, which includes the upper controller based on soft actor-critic (SAC), one of the most competitive continuous control algorithms, and the lower controller based  ...  Soft actor-critic receives the dynamic information of the mobile robot as input, and simultaneously outputs the optimal parameters of incremental PID controllers to compensate for the error between the  ...  Path 1 Model 1 20 20 100 0.260±0.010 Path 2 Model 2 20 20 100 0.255±0.004 Path 3 Model 3 20 17 85 0.251±0.017 Path 4 Model 4 20 19 95 0.259±0.005 Path 1 Model 3 20 18 90 0.254  ... 
arXiv:2103.10686v1 fatcat:yuvleg37kbb65prncgofdx3gpu

Sequence Labeling and Transduction with Output-Adjusted Actor-Critic Training of RNNs

Saeed Najafi
We show that the output-adjusted actor-critic training is significantly better than other techniques for addressing RNN's exposure bias, such as Scheduled Sampling, and Self-Critical policy training. ii  ...  Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks.  ...  We then continue training from the best model using our output-adjusted actor-critic objective.  ... 
doi:10.7939/r39z90t8b fatcat:62avmkie2nfevg4ksmer6d7sdy

A Survey on Visual Navigation for Artificial Agents with Deep Reinforcement Learning

Fanyu Zeng, Chen Wang, Shuzhi Sam Ge
2020 IEEE Access  
Actor-critic methods [24] are hybrid methods of value-based and policy-based algorithms.  ...  [63] fed the target images into actor-critic neural networks in addition to the environmental observations.  ... 
doi:10.1109/access.2020.3011438 fatcat:ie6qvu24qbapbjxtiudh7fumgy
« Previous Showing results 1 — 15 out of 299 results