517 Hits in 5.0 sec

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Chaohai Kang, Chuiting Rong, Weijian Ren, Fengcai Huo, Pengyun Liu
2021 IEEE Access  
Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay (November 2020) VOLUME XX, 2020  ...  Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay (November 2020) and 2 represent noise.  ... 
doi:10.1109/access.2021.3074535 fatcat:eyruw4ha6vh2ndxf5dm3cznrhq

Learning Agents with Prioritization and Parameter Noise in Continuous State and Action Space [chapter]

Rajesh Mangannavar, Gopalakrishnan Srinivasaraghavan
2019 Lecture Notes in Computer Science  
Deep Q-learning networks (DQN) and Deep Deterministic Policy Gradient (DDPG) are two such methods that have shown state-of-the-art results in recent times.  ...  One of the recent breakthroughs in reinforcement learning is the use of deep neural networks as function approximators to approximate the value function or q-function in a reinforcement learning scheme  ...  PRIORITIZED EXPERIENCE REPLAY The prioritized experience replay algorithm is a further improvement on the deep Q-learning methods and can be applied to both DQN and Double DQN.  ... 
doi:10.1007/978-3-030-22796-8_22 fatcat:gahibb7yh5fdtglkoow7htapfu

Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework [article]

Ngoc Duy Nguyen, Thanh Thi Nguyen, Hai Nguyen, Saeid Nahavandi
2020 arXiv   pre-print
Finally, to enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.  ...  For this reason, we designed a deep RL-based framework that strictly ensures flexibility, robustness, and scalability.  ...  Requirements Drawbacks Implementation Value-based method DQN Use a deep convolutional network to • Experience replay • Excessive memory usage [81] directly process raw graphical data and •  ... 
arXiv:2002.11883v1 fatcat:yziq6kwryvh5hiwjm6ju2r5srq

Regularly Updated Deterministic Policy Gradient Algorithm [article]

Shuai Han and Wenbo Zhou and Shuai Lü and Jiayu Yu
2020 arXiv   pre-print
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications.  ...  This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems.  ...  Improvement methods on the target value calculation and network update of deterministic policy gradient methods, such as Twin Delayed Deep Deterministic (TD3) policy gradient algorithm [18] and smoothie  ... 
arXiv:2007.00169v1 fatcat:p4yiniujmbcclbkt6ujsohvwhq

Distributed Prioritized Experience Replay [article]

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver
2018 arXiv   pre-print
The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors.  ...  in a shared experience replay memory; the learner replays samples of experience and updates the neural network.  ...  Figure 2 : 2 policy gradient system based on DDPG (Lillicrap et al., 2016), an implementation of deterministic policy gradients Silver et al. (2014) also similar to older methods (Werbos, 1990; Prokhorov  ... 
arXiv:1803.00933v1 fatcat:tmjsfjvpmjewfc5na53hur2wve

Attentive Experience Replay

Peiquan Sun, Wengang Zhou, Houqiang Li
Experience replay (ER) has become an important component of deep reinforcement learning (RL) algorithms. ER enables RL algorithms to reuse past experiences for the update of current policy.  ...  To tackle this issue, we propose a new replay strategy to prioritize the transitions that contain states frequently visited by current policy.  ...  Twin delayed deep deterministic policy gradient (TD3) (Fujimoto, van Hoof, and Meger 2018) makes several improvements on DDPG to alleviate the overestimation of the value-network.  ... 
doi:10.1609/aaai.v34i04.6049 fatcat:ivpircajqveftcpi3bb4qomtt4

Solving Continuous Control with Episodic Memory [article]

Igor Kuznetsov, Andrey Filchenkov
2021 arXiv   pre-print
We further improve performance by introducing episodic-based replay buffer prioritization.  ...  We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.  ...  Our algorithm builds on the Deep Deterministic Policy Gradient [Lillicrap et al., 2015] by modifying critic's objective and introducing episodic-based prioritized experience replay.  ... 
arXiv:2106.08832v1 fatcat:gjalegijszeidgx4dv4yxmke4u

Experience-driven Networking: A Deep Reinforcement Learning based Approach [article]

Zhiyuan Xu, Jian Tang, Jingsong Meng, Weiyi Zhang, Yanzhi Wang, Chi Harold Liu, Dejun Yang
2018 arXiv   pre-print
We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE.  ...  offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy  ...  [16] proposed an actor-critic-based and model-free algorithm, DDPG, based on the deterministic policy gradient that can operate over continuous action spaces. Gu et al.  ... 
arXiv:1801.05757v1 fatcat:vxb4qrmyrnb3vixtzy7b3rkpyu

Dueling Network Architectures for Deep Reinforcement Learning [article]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
2016 arXiv   pre-print
Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function.  ...  In recent years there have been many successes of using deep representations in reinforcement learning.  ...  Prioritized Replay A recent innovation in prioritized experience replay (Schaul et al., 2016) built on top of DDQN and further improved the state-of-the-art.  ... 
arXiv:1511.06581v3 fatcat:pg6taaspnbg3tkzc5e5tbl7rm4

Action Branching Architectures for Deep Reinforcement Learning [article]

Arash Tavakoli, Fabio Pardo, Petar Kormushev
2019 arXiv   pre-print
Furthermore, we show that the proposed agent performs competitively against a state-of-the-art continuous control algorithm, Deep Deterministic Policy Gradient (DDPG).  ...  To illustrate the approach, we present a novel agent, called Branching Dueling Q-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network (Dueling DDQN).  ...  Double Deep Q-Network (Dueling DDQN).  ... 
arXiv:1711.08946v2 fatcat:qtwkliw3nnctbfr4jvfmnbhwxu

Reinforcement Learning and Video Games [article]

Yue Zheng
2019 arXiv   pre-print
Reinforcement learning has exceeded human-level performance in game playing AI with deep learning methods according to the experiments from DeepMind on Go and Atari games.  ...  Batch normalization is a method to solve internal covariate shift problems in deep neural network. The positive influence of this on reinforcement learning has also been proved in this study.  ...  Although policy-based methods such as proximal policy gradient is a good choice too, only DQN, double DQN, DQN with prioritized experience replay and dueling DQN will be investigated in this project due  ... 
arXiv:1909.04751v1 fatcat:ohrjo2yelncchd5brbhz3yc4cm

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning [article]

Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos
2018 arXiv   pre-print
Next, we introduce the e̱ṯa̱-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline.  ...  Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of neighboring observations for more efficient replay prioritization.  ...  THE REACTOR The Reactor is a combination of four novel contributions on top of recent improvements to both deep value-based RL and policy-gradient algorithms.  ... 
arXiv:1704.04651v2 fatcat:46wvnqppnfc4pbownydwdsv3cy

Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network

Hamid Ali, Hammad Majeed, Imran Usman, Khaled A. Almejalli, Haider Abbas
2021 Wireless Communications and Mobile Computing  
We demonstrate our approach on different well known continuous control simulated environments.  ...  To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained.  ...  Instead of prioritizing samples based on Td error, we can prioritize samples based on entropy.  ... 
doi:10.1155/2021/9920591 fatcat:tx2whyke3zb4fbg64g4l27wwiy

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward [article]

Hassam Ullah Sheikh, Ladislau Bölöni
2020 arXiv   pre-print
To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to  ...  We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG  ...  This allows us to apply performance enhancement techniques such as Prioritized Experience Replay (PER) [14] and Twin Delayed Deep Deterministic Policy Gradients (TD3) [15] to tackle the overestimation  ... 
arXiv:2003.10598v1 fatcat:cervcqlhc5dxlfmqrjyl3c5l3u

ChainerRL: A Deep Reinforcement Learning Library [article]

Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa
2021 arXiv   pre-print
The ChainerRL source code can be found on GitHub:  ...  In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework.  ...  We thank Kohei Hayashi and Jason Naradowsky for useful comments on how to improve the paper.  ... 
arXiv:1912.03905v2 fatcat:awe4liu7qfaevesw2dqeutxcoy
« Previous Showing results 1 — 15 out of 517 results