5,025 Hits in 2.9 sec

Coordinate-wise Control Variates for Deep Policy Gradients [article]

Yuanyi Zhong, Yuan Zhou, Jian Peng
2021 arXiv   pre-print
This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies.  ...  The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice.  ...  Conclusion We propose the coordinate-wise control variates for variance reduction in deep policy gradient methods.  ... 
arXiv:2107.04987v2 fatcat:d6dkth7kbbh4ppzimkramacvlu

Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM [article]

Muhammad Ilyas Patel, Shrey Singla, Razeem Ahmad Ali Mattathodi, Sumit Sharma, Deepam Gautam, Srinivasa Rao Kundeti
2021 arXiv   pre-print
We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem.  ...  , for better localization of the existing landmarks, in order to identify and locate the important atlas landmarks even in oblique scans.  ...  For an ideal MRI image, the policy DA1 adds one of the patient side variations, and along with that, it adds one of the machine side variations. The policy DA2 only simulates machine-side variations.  ... 
arXiv:2111.00837v1 fatcat:q3rkvcnu6jeojbcql7lmfpxsbm

Gradient Monitored Reinforcement Learning [article]

Mohammed Sharafath Abdul Hameed
2020 arXiv   pre-print
The approach is applied to two discrete (Multi-Robot Co-ordination problem and Atari games) and one continuous control task (MuJoCo) using Advantage Actor-Critic (A2C) and Proximal Policy Optimization  ...  This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning.  ...  Notable example of the latter include value function based methods like deep Q-networks [6] , policy gradient methods like deep deterministic policy gradient [5] , Advantage Actor Critic (A2C) [7] ,  ... 
arXiv:2005.12108v1 fatcat:c4cgoylw2rcuhfbsdbk5hfdsfu

Multiagent Soft Q-Learning [article]

Ermo Wei, Drew Wicke, David Freelan, Sean Luke
2018 arXiv   pre-print
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games.  ...  To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls.  ...  We thank Tuomas Haarnoja and Haoran Tang for their helpful comments on implementing Soft Q-Learning.  ... 
arXiv:1804.09817v1 fatcat:qxwspauyxzgr3ikf6cjnylu6xa

Benchmarking Deep Reinforcement Learning for Continuous Control [article]

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
2016 arXiv   pre-print
Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning.  ...  However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark.  ...  Acknowledgements We thank Emo Todorov and Yuval Tassa for providing the MuJoCo simulator, and Sergey Levine, Aviv Tamar, Chelsea Finn, and the anonymous ICML reviewers for insightful comments.  ... 
arXiv:1604.06778v3 fatcat:lceiaesdbvallnt57nbxp7b42q

Team Deep Mixture of Experts for Distributed Power Control [article]

Matteo Zecchin, David Gesbert, Marios Kountouris
2020 arXiv   pre-print
We consider the decentralized power control problem as an example to showcase the validity of the proposed model and to compare it against other power control algorithms.  ...  In particular, it was established that DNNs can be used to derive policies that are robust with respect to the information noise statistic affecting the local information (e.g.  ...  policy for the current uncertainty levels.  ... 
arXiv:2007.14147v1 fatcat:34i4vsztvrei5izhldw2xevxrm

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control [article]

Yuguang Yang
2019 arXiv   pre-print
learning optimal control policy for its own stage.  ...  Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple  ...  Second, SDQL allows stacking different types of Q learning sub-networks, such as the Deep Q Network (DQN) [6] and actor-critic architecture like Deep Deterministic Policy Gradient (DDPG) network [7]  ... 
arXiv:1911.10684v1 fatcat:zmmcpfx46jcqnd44xpopltoxde

Towards Generalization and Data Efficient Learning of Deep Robotic Grasping [article]

Zhixin Chen, Mengxiang Lin, Zhixin Jia, Shibo Jian
2020 arXiv   pre-print
Deep reinforcement learning (DRL) has been proven to be a powerful paradigm for learning complex control policy autonomously.  ...  In this paper, we propose a DRL based robotic visual grasping framework, in which visual perception and control policy are trained separately rather than end-to-end.  ...  For our framework, we adopt a policy gradient method called Proximal Policy Optimization (PPO) [16] which is favorable for high dimension continuous robotic control problem.  ... 
arXiv:2007.00982v1 fatcat:76cmqwfz5bcp5kbae6xhttzqsa

Deep Deterministic Policy Gradient for Urban Traffic Light Control [article]

Noe Casas
2017 arXiv   pre-print
In order to overcome the large scale of the available state information, we propose to rely on the ability of deep Learning approaches to handle large input spaces, in the form of Deep Deterministic Policy  ...  Gradient (DDPG) algorithm.  ...  Deep Network Architecture Our neural architecture consists in a Deep Deterministic Actor-Critic Policy Gradient approach.  ... 
arXiv:1703.09035v2 fatcat:ivjtwotq4zgzpffiem5urqlw3i

Optimization and passive flow control using single-step deep reinforcement learning [article]

H. Ghraieb, J. Viquerat, A. Larcher, P. Meliga, E. Hachem
2020 arXiv   pre-print
This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems.  ...  , which paves the way for future progress in optimal flow control using this new class of methods.  ...  The flow is described in a Cartesian coordinate system (x, y) with drag force (resp. lift force) positive in the stream-wise +x direction (resp. the cross-wise +y direction).  ... 
arXiv:2006.02979v1 fatcat:2wkiyqebe5cyxicm4pbboojvwm

Learning Unmanned Aerial Vehicle Control for Autonomous Target Following [article]

Siyi Li, Tianbo Liu, Chi Zhang, Dit-Yan Yeung, Shaojie Shen
2017 arXiv   pre-print
We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic  ...  In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target.  ...  The Deep Deterministic Policy Gradient (DDPG) [20] algorithm, based on Deterministic Policy Gradient [26] , maintains a parameterized actor function µ(s|θ µ ) which specifies the current policy by deterministically  ... 
arXiv:1709.08233v1 fatcat:ongwsqarrzahvkekv7dxmhbm5m

Attentional Network for Visual Object Detection [article]

Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-massoud Farahmand
2017 arXiv   pre-print
We propose augmenting deep neural networks with an attention mechanism for the visual object detection task.  ...  Due to lacks of ground truth annotations of the visual attention mechanism, we train our network using a reinforcement learning algorithm with policy gradients.  ...  The policy gradient algorithm, in its simplest form, changes the policy parameters in the direction of gradient of J(π θ ) by the gradient ascent update, θ i+1 ← θ i +α i ∇J(π θi ) for some choice of step  ... 
arXiv:1702.01478v1 fatcat:t3ibcr76gve6fel7goew2on7eq

Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters [article]

Trevor A. McInroe and Michael Spurrier and Jennifer Sieber and Stephen Conneely
2021 arXiv   pre-print
We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it's possible to know a priori if a state representation will lend itself  ...  The results from this analysis provide three main insights into how deep RL agents learn.  ...  gradient methods.  ... 
arXiv:2103.06398v1 fatcat:ew3bkuxujbbsveme7pfx433d6a

Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control [article]

Tianshu Chu, Jie Wang, Lara Codecà, Zhaojian Li
2019 arXiv   pre-print
Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power  ...  This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC.  ...  Early-stage ATSC methods solve optimization problems to find efficient coordination and control policies.  ... 
arXiv:1903.04527v1 fatcat:hmzh7562dvc4jinmc3exahhpgu

Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data [article]

Priya Sundaresan, Jennifer Grannen, Brijen Thananjeyan, Ashwin Balakrishna, Michael Laskey, Kevin Stone, Joseph E. Gonzalez, Ken Goldberg
2020 arXiv   pre-print
We address these challenges using interpretable deep visual representations for rope, extending recent work on dense object descriptors for robot manipulation.  ...  This facilitates the design of interpretable and transferable geometric policies built on top of the learned representations, decoupling visual reasoning and control.  ...  as in [21] , and then slightly perturbing knot coordinate positions for variation.  ... 
arXiv:2003.01835v1 fatcat:stifvkwa3bayjaa2xjtqatbaqa
« Previous Showing results 1 — 15 out of 5,025 results