A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Coordinate-wise Control Variates for Deep Policy Gradients
[article]
2021
arXiv
pre-print
This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. ...
The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. ...
Conclusion We propose the coordinate-wise control variates for variance reduction in deep policy gradient methods. ...
arXiv:2107.04987v2
fatcat:d6dkth7kbbh4ppzimkramacvlu
Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM
[article]
2021
arXiv
pre-print
We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem. ...
, for better localization of the existing landmarks, in order to identify and locate the important atlas landmarks even in oblique scans. ...
For an ideal MRI image, the policy DA1 adds one of the patient side variations, and along with that, it adds one of the machine side variations. The policy DA2 only simulates machine-side variations. ...
arXiv:2111.00837v1
fatcat:q3rkvcnu6jeojbcql7lmfpxsbm
Gradient Monitored Reinforcement Learning
[article]
2020
arXiv
pre-print
The approach is applied to two discrete (Multi-Robot Co-ordination problem and Atari games) and one continuous control task (MuJoCo) using Advantage Actor-Critic (A2C) and Proximal Policy Optimization ...
This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. ...
Notable example of the latter include value function based methods like deep Q-networks [6] , policy gradient methods like deep deterministic policy gradient [5] , Advantage Actor Critic (A2C) [7] , ...
arXiv:2005.12108v1
fatcat:c4cgoylw2rcuhfbsdbk5hfdsfu
Multiagent Soft Q-Learning
[article]
2018
arXiv
pre-print
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. ...
To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. ...
We thank Tuomas Haarnoja and Haoran Tang for their helpful comments on implementing Soft Q-Learning. ...
arXiv:1804.09817v1
fatcat:qxwspauyxzgr3ikf6cjnylu6xa
Benchmarking Deep Reinforcement Learning for Continuous Control
[article]
2016
arXiv
pre-print
Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. ...
However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. ...
Acknowledgements We thank Emo Todorov and Yuval Tassa for providing the MuJoCo simulator, and Sergey Levine, Aviv Tamar, Chelsea Finn, and the anonymous ICML reviewers for insightful comments. ...
arXiv:1604.06778v3
fatcat:lceiaesdbvallnt57nbxp7b42q
Team Deep Mixture of Experts for Distributed Power Control
[article]
2020
arXiv
pre-print
We consider the decentralized power control problem as an example to showcase the validity of the proposed model and to compare it against other power control algorithms. ...
In particular, it was established that DNNs can be used to derive policies that are robust with respect to the information noise statistic affecting the local information (e.g. ...
policy for the current uncertainty levels. ...
arXiv:2007.14147v1
fatcat:34i4vsztvrei5izhldw2xevxrm
A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control
[article]
2019
arXiv
pre-print
learning optimal control policy for its own stage. ...
Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple ...
Second, SDQL allows stacking different types of Q learning sub-networks, such as the Deep Q Network (DQN) [6] and actor-critic architecture like Deep Deterministic Policy Gradient (DDPG) network [7] ...
arXiv:1911.10684v1
fatcat:zmmcpfx46jcqnd44xpopltoxde
Towards Generalization and Data Efficient Learning of Deep Robotic Grasping
[article]
2020
arXiv
pre-print
Deep reinforcement learning (DRL) has been proven to be a powerful paradigm for learning complex control policy autonomously. ...
In this paper, we propose a DRL based robotic visual grasping framework, in which visual perception and control policy are trained separately rather than end-to-end. ...
For our framework, we adopt a policy gradient method called Proximal Policy Optimization (PPO) [16] which is favorable for high dimension continuous robotic control problem. ...
arXiv:2007.00982v1
fatcat:76cmqwfz5bcp5kbae6xhttzqsa
Deep Deterministic Policy Gradient for Urban Traffic Light Control
[article]
2017
arXiv
pre-print
In order to overcome the large scale of the available state information, we propose to rely on the ability of deep Learning approaches to handle large input spaces, in the form of Deep Deterministic Policy ...
Gradient (DDPG) algorithm. ...
Deep Network Architecture Our neural architecture consists in a Deep Deterministic Actor-Critic Policy Gradient approach. ...
arXiv:1703.09035v2
fatcat:ivjtwotq4zgzpffiem5urqlw3i
Optimization and passive flow control using single-step deep reinforcement learning
[article]
2020
arXiv
pre-print
This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems. ...
, which paves the way for future progress in optimal flow control using this new class of methods. ...
The flow is described in a Cartesian coordinate system (x, y) with drag force (resp. lift force) positive in the stream-wise +x direction (resp. the cross-wise +y direction). ...
arXiv:2006.02979v1
fatcat:2wkiyqebe5cyxicm4pbboojvwm
Learning Unmanned Aerial Vehicle Control for Autonomous Target Following
[article]
2017
arXiv
pre-print
We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic ...
In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. ...
The Deep Deterministic Policy Gradient (DDPG) [20] algorithm, based on Deterministic Policy Gradient [26] , maintains a parameterized actor function µ(s|θ µ ) which specifies the current policy by deterministically ...
arXiv:1709.08233v1
fatcat:ongwsqarrzahvkekv7dxmhbm5m
Attentional Network for Visual Object Detection
[article]
2017
arXiv
pre-print
We propose augmenting deep neural networks with an attention mechanism for the visual object detection task. ...
Due to lacks of ground truth annotations of the visual attention mechanism, we train our network using a reinforcement learning algorithm with policy gradients. ...
The policy gradient algorithm, in its simplest form, changes the policy parameters in the direction of gradient of J(π θ ) by the gradient ascent update, θ i+1 ← θ i +α i ∇J(π θi ) for some choice of step ...
arXiv:1702.01478v1
fatcat:t3ibcr76gve6fel7goew2on7eq
Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters
[article]
2021
arXiv
pre-print
We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it's possible to know a priori if a state representation will lend itself ...
The results from this analysis provide three main insights into how deep RL agents learn. ...
gradient methods. ...
arXiv:2103.06398v1
fatcat:ew3bkuxujbbsveme7pfx433d6a
Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control
[article]
2019
arXiv
pre-print
Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power ...
This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC. ...
Early-stage ATSC methods solve optimization problems to find efficient coordination and control policies. ...
arXiv:1903.04527v1
fatcat:hmzh7562dvc4jinmc3exahhpgu
Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data
[article]
2020
arXiv
pre-print
We address these challenges using interpretable deep visual representations for rope, extending recent work on dense object descriptors for robot manipulation. ...
This facilitates the design of interpretable and transferable geometric policies built on top of the learned representations, decoupling visual reasoning and control. ...
as in [21] , and then slightly perturbing knot coordinate positions for variation. ...
arXiv:2003.01835v1
fatcat:stifvkwa3bayjaa2xjtqatbaqa
« Previous
Showing results 1 — 15 out of 5,025 results