A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
2021
IEEE Access
Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay (November 2020) VOLUME XX, 2020 ...
Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay (November 2020)
and 2 represent noise. ...
doi:10.1109/access.2021.3074535
fatcat:eyruw4ha6vh2ndxf5dm3cznrhq
Learning Agents with Prioritization and Parameter Noise in Continuous State and Action Space
[chapter]
2019
Lecture Notes in Computer Science
Deep Q-learning networks (DQN) and Deep Deterministic Policy Gradient (DDPG) are two such methods that have shown state-of-the-art results in recent times. ...
One of the recent breakthroughs in reinforcement learning is the use of deep neural networks as function approximators to approximate the value function or q-function in a reinforcement learning scheme ...
PRIORITIZED EXPERIENCE REPLAY The prioritized experience replay algorithm is a further improvement on the deep Q-learning methods and can be applied to both DQN and Double DQN. ...
doi:10.1007/978-3-030-22796-8_22
fatcat:gahibb7yh5fdtglkoow7htapfu
Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework
[article]
2020
arXiv
pre-print
Finally, to enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents. ...
For this reason, we designed a deep RL-based framework that strictly ensures flexibility, robustness, and scalability. ...
Requirements
Drawbacks
Implementation
Value-based method
DQN
Use a deep convolutional network to
• Experience replay
• Excessive memory usage
[81]
directly process raw graphical data and
• ...
arXiv:2002.11883v1
fatcat:yziq6kwryvh5hiwjm6ju2r5srq
Regularly Updated Deterministic Policy Gradient Algorithm
[article]
2020
arXiv
pre-print
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. ...
This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. ...
Improvement methods on the target value calculation and network update of deterministic policy gradient methods, such as Twin Delayed Deep Deterministic (TD3) policy gradient algorithm [18] and smoothie ...
arXiv:2007.00169v1
fatcat:p4yiniujmbcclbkt6ujsohvwhq
Distributed Prioritized Experience Replay
[article]
2018
arXiv
pre-print
The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. ...
in a shared experience replay memory; the learner replays samples of experience and updates the neural network. ...
Figure 2 : 2 policy gradient system based on DDPG (Lillicrap et al., 2016), an implementation of deterministic policy gradients Silver et al. (2014) also similar to older methods (Werbos, 1990; Prokhorov ...
arXiv:1803.00933v1
fatcat:tmjsfjvpmjewfc5na53hur2wve
Attentive Experience Replay
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Experience replay (ER) has become an important component of deep reinforcement learning (RL) algorithms. ER enables RL algorithms to reuse past experiences for the update of current policy. ...
To tackle this issue, we propose a new replay strategy to prioritize the transitions that contain states frequently visited by current policy. ...
Twin delayed deep deterministic policy gradient (TD3) (Fujimoto, van Hoof, and Meger 2018) makes several improvements on DDPG to alleviate the overestimation of the value-network. ...
doi:10.1609/aaai.v34i04.6049
fatcat:ivpircajqveftcpi3bb4qomtt4
Solving Continuous Control with Episodic Memory
[article]
2021
arXiv
pre-print
We further improve performance by introducing episodic-based replay buffer prioritization. ...
We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms. ...
Our algorithm builds on the Deep Deterministic Policy Gradient [Lillicrap et al., 2015] by modifying critic's objective and introducing episodic-based prioritized experience replay. ...
arXiv:2106.08832v1
fatcat:gjalegijszeidgx4dv4yxmke4u
Experience-driven Networking: A Deep Reinforcement Learning based Approach
[article]
2018
arXiv
pre-print
We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. ...
offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy ...
[16] proposed an actor-critic-based and model-free algorithm, DDPG, based on the deterministic policy gradient that can operate over continuous action spaces. Gu et al. ...
arXiv:1801.05757v1
fatcat:vxb4qrmyrnb3vixtzy7b3rkpyu
Dueling Network Architectures for Deep Reinforcement Learning
[article]
2016
arXiv
pre-print
Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. ...
In recent years there have been many successes of using deep representations in reinforcement learning. ...
Prioritized Replay A recent innovation in prioritized experience replay (Schaul et al., 2016) built on top of DDQN and further improved the state-of-the-art. ...
arXiv:1511.06581v3
fatcat:pg6taaspnbg3tkzc5e5tbl7rm4
Action Branching Architectures for Deep Reinforcement Learning
[article]
2019
arXiv
pre-print
Furthermore, we show that the proposed agent performs competitively against a state-of-the-art continuous control algorithm, Deep Deterministic Policy Gradient (DDPG). ...
To illustrate the approach, we present a novel agent, called Branching Dueling Q-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network (Dueling DDQN). ...
Double Deep Q-Network (Dueling DDQN). ...
arXiv:1711.08946v2
fatcat:qtwkliw3nnctbfr4jvfmnbhwxu
Reinforcement Learning and Video Games
[article]
2019
arXiv
pre-print
Reinforcement learning has exceeded human-level performance in game playing AI with deep learning methods according to the experiments from DeepMind on Go and Atari games. ...
Batch normalization is a method to solve internal covariate shift problems in deep neural network. The positive influence of this on reinforcement learning has also been proved in this study. ...
Although policy-based methods such as proximal policy gradient is a good choice too, only DQN, double DQN, DQN with prioritized experience replay and dueling DQN will be investigated in this project due ...
arXiv:1909.04751v1
fatcat:ohrjo2yelncchd5brbhz3yc4cm
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
[article]
2018
arXiv
pre-print
Next, we introduce the e̱ṯa̱-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline. ...
Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of neighboring observations for more efficient replay prioritization. ...
THE REACTOR The Reactor is a combination of four novel contributions on top of recent improvements to both deep value-based RL and policy-gradient algorithms. ...
arXiv:1704.04651v2
fatcat:46wvnqppnfc4pbownydwdsv3cy
Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network
2021
Wireless Communications and Mobile Computing
We demonstrate our approach on different well known continuous control simulated environments. ...
To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. ...
Instead of prioritizing samples based on Td error, we can prioritize samples based on entropy. ...
doi:10.1155/2021/9920591
fatcat:tx2whyke3zb4fbg64g4l27wwiy
Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward
[article]
2020
arXiv
pre-print
To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to ...
We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG ...
This allows us to apply performance enhancement techniques such as Prioritized Experience Replay (PER) [14] and Twin Delayed Deep Deterministic Policy Gradients (TD3) [15] to tackle the overestimation ...
arXiv:2003.10598v1
fatcat:cervcqlhc5dxlfmqrjyl3c5l3u
ChainerRL: A Deep Reinforcement Learning Library
[article]
2021
arXiv
pre-print
The ChainerRL source code can be found on GitHub: https://github.com/chainer/chainerrl. ...
In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework. ...
We thank Kohei Hayashi and Jason Naradowsky for useful comments on how to improve the paper. ...
arXiv:1912.03905v2
fatcat:awe4liu7qfaevesw2dqeutxcoy
« Previous
Showing results 1 — 15 out of 517 results