A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Model-Augmented Actor-Critic: Backpropagating through Paths
[article]
2020
arXiv
pre-print
Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion. ...
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning. ...
MODEL-AUGMENTED ACTOR-CRITIC OBJECTIVE Among model-free methods, actor-critic methods have shown superior performance in terms of sample efficiency and asymptotic performance (Haarnoja et al., 2018a) ...
arXiv:2005.08068v1
fatcat:mss7asjil5ezjkvopw75ihkppq
Stochastic Activation Actor Critic Methods
[chapter]
2020
Lecture Notes in Computer Science
Yet effective and general approaches to include such elements in actor-critic models are still lacking. ...
Inspired by the aforementioned techniques, we propose an effective way to inject randomness into actor-critic models to improve general exploratory behavior and reflect environment uncertainty. ...
to discover means to strengthen actor-critic methods with stochastic modeling components. ...
doi:10.1007/978-3-030-46133-1_7
fatcat:4byrgso2wjdydpamygyryuc6gu
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[article]
2018
arXiv
pre-print
In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. ...
By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior ...
We will discuss how we can devise a soft actor-critic algorithm through a policy iteration formulation, where we instead evaluate the Q-function of the current policy and update the policy through an off-policy ...
arXiv:1801.01290v2
fatcat:5737bv4lmzdzxbv6xreow6phfy
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
[article]
2020
arXiv
pre-print
Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling. ...
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation ...
Improving explo-
ration in soft-actor-critic with normalizing flows policies.
arXiv preprint arXiv:1906.02771, 2019.
Wibisono, A., Wilson, A. C., and Jordan, M. I. ...
arXiv:2002.07101v1
fatcat:xqhunznulzc23oxiixbkamrx3a
Caching Transient Content for IoT Sensing: Multi-Agent Soft Actor-Critic
[article]
2020
arXiv
pre-print
To efficiently handle the exponentially large number of actions, we devise a novel reinforcement learning approach, which is a discrete multi-agent variant of soft actor-critic (SAC). ...
Specifically, we model the cache update problem as a cooperative multi-agent Markov decision process with the goal of minimizing the long-term average weighted cost. ...
Proposed Multi-Agent Discrete Soft Actor-Critic Learning SAC is the state-of-the-art RL algorithm, which is as a result of an entropy regularized formalism that augments exploration [30] . ...
arXiv:2008.13191v1
fatcat:7bdizygnobbixppkoo4f4azmc4
Loss is its own Reward: Self-Supervision for Reinforcement Learning
[article]
2017
arXiv
pre-print
To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses. ...
ML-DDPG (Munk et al., 2016) extends actor-critic with a one-step predictive model of the successor state and reward. ...
The observation mapping is learned by the first layer of the the model, transferred to the actor-critic network, and then fixed. ...
arXiv:1612.07307v2
fatcat:fcctw5rmhbc6bnoqj7lbqyo5ju
A2C: Attention-Augmented Contrastive Learning for State Representation Extraction
2020
Applied Sciences
Finally an attention-augmented contrastive learning method called A2C is obtained. ...
The bottom part is the shared Actor-Critic Model.
Figure 4 . 4 Figure 4. ...
Moreover, a shared Actor-Critic model was updated using those training samples. ...
doi:10.3390/app10175902
fatcat:rt7bxtqyvbhyznpjhrnxopbn4i
Accelerated Policy Learning with Parallel Differentiable Simulation
[article]
2022
arXiv
pre-print
Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments ...
In a typical model-based reinforcement learning algorithm, the learned model can be used in two ways: (1) data augmentation, (2) policy gradient estimation with backpropagation through a differentiable ...
SHORT-HORIZON ACTOR-CRITIC (SHAC) To resolve the aforementioned issues of gradient-based policy learning, we propose the Short-Horizon Actor-Critic method (SHAC). ...
arXiv:2204.07137v1
fatcat:7mc3mbb4cnglrh3auxavtvb47m
Supplementary Information: Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement Learning
[article]
2020
Zenodo
Discounted future loss backpropagation through time 42 46 , and playing score-based computer games 47, 48 . ...
Our actor, critic and generator are trained together. It follows that generator losses, which our critic learns to predict, decrease throughout training as generator performance improves. ...
doi:10.5281/zenodo.4304454
fatcat:xechwbtb7resjmd4dhtw2gcv3y
Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains
1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227)
A standard \actor-critic" neural network model has two separate components: the action (actor) network and the value (critic) network. ...
Due to the nature of neural model-free l e arning, the agent needs many iterations to nd the optimal actions even in small-scale path problems. ...
Through repeated passes, the actor-critic Elman nets learned the optimal actions as well as the estimated maximum values. Table 3 . ...
doi:10.1109/ijcnn.1998.687169
fatcat:r44xdijcynf7jftn6kqpp362nq
Continuous-Time Adaptive Critics
2007
IEEE Transactions on Neural Networks
Index Terms-Actor-critic adaptation, adaptive critic design (ACD), approximate dynamic programming, backpropagation through time (BPTT), continuous adaptive critic designs, real-time recurrent learning ...
Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. ...
The influence of the indirect path through is lost for and the gradient in this case is the instantaneous gradient used in BPTT . ...
doi:10.1109/tnn.2006.889499
pmid:17526332
fatcat:v5msmlbvnvaorifzzihuzmr6fi
Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement Learning
[article]
2021
arXiv
pre-print
Source code, pretrained models, and training data is openly accessible at https://github.com/Jeffrey-Ede/adaptive-scans ...
Thus, we present a prototype for a contiguous sparse scan system that piecewise adapts scan paths to specimens as they are scanned. ...
Discounted future loss backpropagation through time 44 48 , and playing score-based computer games 49, 50 . ...
arXiv:2004.02786v8
fatcat:tcbkzlcqkbgczarb6adguoxzni
A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots
[article]
2021
arXiv
pre-print
A new hierarchical structure is developed, which includes the upper controller based on soft actor-critic (SAC), one of the most competitive continuous control algorithms, and the lower controller based ...
Soft actor-critic receives the dynamic information of the mobile robot as input, and simultaneously outputs the optimal parameters of incremental PID controllers to compensate for the error between the ...
Path 1
Model 1
20
20
100
0.260±0.010
Path 2
Model 2
20
20
100
0.255±0.004
Path 3
Model 3
20
17
85
0.251±0.017
Path 4
Model 4
20
19
95
0.259±0.005
Path 1
Model 3
20
18
90
0.254 ...
arXiv:2103.10686v1
fatcat:yuvleg37kbb65prncgofdx3gpu
Sequence Labeling and Transduction with Output-Adjusted Actor-Critic Training of RNNs
2018
We show that the output-adjusted actor-critic training is significantly better than other techniques for addressing RNN's exposure bias, such as Scheduled Sampling, and Self-Critical policy training. ii ...
Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks. ...
We then continue training from the best model using our output-adjusted actor-critic objective. ...
doi:10.7939/r39z90t8b
fatcat:62avmkie2nfevg4ksmer6d7sdy
A Survey on Visual Navigation for Artificial Agents with Deep Reinforcement Learning
2020
IEEE Access
Actor-critic methods [24] are hybrid methods of value-based and policy-based algorithms. ...
[63] fed the target images into actor-critic neural networks in addition to the environmental observations. ...
doi:10.1109/access.2020.3011438
fatcat:ie6qvu24qbapbjxtiudh7fumgy
« Previous
Showing results 1 — 15 out of 299 results