A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Distributed Prioritized Experience Replay
[article]
2018
arXiv
pre-print
The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. ...
in a shared experience replay memory; the learner replays samples of experience and updates the neural network. ...
OUR CONTRIBUTION: DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY In this paper we extend prioritized experience replay to the distributed setting and show that this is a highly scalable approach to deep reinforcement ...
arXiv:1803.00933v1
fatcat:tmjsfjvpmjewfc5na53hur2wve
Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
2017
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Furthermore, we propose a new sampling framework termed hierarchical prioritized experience replay to selectively choose experiences from the replay memories of each task domain to perform learning on ...
Second, we propose hierarchical prioritized experience replay to enhance the benefit of prioritization by regularizing the distribution of the sampled experiences from each domain. ...
However, prioritized replay introduces distribution bias to the sampled experiences, which means that the original state distribution cannot be preserved. ...
doi:10.1609/aaai.v31i1.10733
fatcat:znbzsfifmvfqzega4d444oyzia
Experience Replay with Likelihood-free Importance Weights
[article]
2020
arXiv
pre-print
We use a likelihood-free density ratio estimator over the replay buffer to assign the prioritization weights. ...
distribution of the current policy. ...
Prioritized Experience Replay based on Stationary Distributions Assume that d, the distribution the replay buffer D is sampled from, is supported on the entire space S × A, and that we have infinite samples ...
arXiv:2006.13169v1
fatcat:vbjy5zdezvgrfdbxnuyi72l5fe
Prioritized Experience Replay
[article]
2016
arXiv
pre-print
DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games. ...
In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. ...
PRIORITIZED REPLAY Using a replay memory leads to design choices at two levels: which experiences to store, and which experiences to replay (and how to do so). ...
arXiv:1511.05952v4
fatcat:mcttbjzpsvhhrkcupyt2cksqai
Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
[article]
2021
arXiv
pre-print
In this paper, we develop a novel algorithm, Batch Prioritizing Experience Replay via KL Divergence (KLPER), which prioritizes batch of transitions rather than directly prioritizing each transition. ...
Therefore, experience replay prioritization algorithms recalculate the significance of a transition when the corresponding transition is sampled to gain computational efficiency. ...
In this paper, we introduce a novel experience replay prioritization method, Batch Prioritized Experience Replay via KL Divergence, KLPER. ...
arXiv:2111.01865v2
fatcat:cndqpix3bbhuniy5aidurpjd6q
Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization
[article]
2021
arXiv
pre-print
Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. ...
First, we prioritize virtual goals from which the agent will learn more valuable information. ...
In all algorithms we used prioritized experience replay (PER) [16] . ...
arXiv:1905.05498v5
fatcat:2asfx7vb5bhrdntwspwzkyk4me
Curiosity-Driven Experience Prioritization via Density Estimation
[article]
2020
arXiv
pre-print
In our experiments, we combined CDP with Deep Deterministic Policy Gradient (DDPG) with or without Hindsight Experience Replay (HER). ...
To address this problem, we propose a novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states. ...
Experience Replay To the best our knowledge, the most similar method to CDP is Prioritized Experience Replay (PER) [41] . ...
arXiv:1902.08039v3
fatcat:vuvhi5evz5e4tcmpvseijifkhy
A DPDK-Based Acceleration Method for Experience Sampling of Distributed Reinforcement Learning
[article]
2022
arXiv
pre-print
latencies for prioritized experience sampling by 21.9% to 29.1%. ...
As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication ...
Ape-X introduces a prioritized experience replay for large-scale distributed reinforcement learning systems that consist of Actor processes, experience replay memory, and Learner process. ...
arXiv:2110.13506v2
fatcat:s34wxpnohjdypmcokxfroi22mi
Advances in Experience Replay
[article]
2018
arXiv
pre-print
This project combines recent advances in experience replay techniques, namely, Combined Experience Replay (CER), Prioritized Experience Replay (PER), and Hindsight Experience Replay (HER). ...
CER always adds the most recent experience to the batch. PER chooses which experiences should be replayed based on how beneficial they will be towards learning. ...
Prioritized Experience Replay (PER) In regular experience replay, all transitions are sampled uniformly. ...
arXiv:1805.05536v1
fatcat:dtdfqfgbznfnbaosp3e2jfthbi
Double Prioritized State Recycled Experience Replay
[article]
2020
arXiv
pre-print
A prior work called prioritized experience replay was developed where experiences are prioritized, so as to replay experiences seeming to be more important more frequently. ...
In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experiences in both training stage and storing stage, as well as replacing the experiences ...
palliate the non-stationary distribution problem. ...
arXiv:2007.03961v3
fatcat:ze7h7yqtmvdnvldatbfiqdgbki
Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
2021
IEEE Access
This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight ...
The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization ...
Also, in [41] , the extension of Prioritized Experience Replay (PER) tries to prioritize experiences instead. ...
doi:10.1109/access.2021.3069975
fatcat:fgfnq5afybgr7cpv7f67w3r3wy
Prioritized Level Replay
[article]
2021
arXiv
pre-print
We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future ...
The following sections describes how Prioritized Level Replay updates the replay distribution P replay (l|Λ seen ), namely through level scoring and staleness-aware prioritization. ...
(d) to determine whether to replay a level sampled from the replay distribution P replay (l|Λ seen ) or to experience a new, unseen level from Λ train , according to some distribution P new (l|Λ train ...
arXiv:2010.03934v4
fatcat:a4urejqeirf27adktdez5p6w6q
One Pass ImageNet
[article]
2021
arXiv
pre-print
The idea of prioritized experience replay [15] is to add a priority score to each example and sample from the buffer according to the probability distribution normalized from the priority scores. ...
Let p(x) be the distribution of the original data and q(x) be the distribution in the replay buffer. The original objective is E p [ℓ(x; θ)]. ...
arXiv:2111.01956v1
fatcat:o3mnbg6t6vdhtbwwkdyx5z3gbq
Combining Experience Replay with Exploration by Random Network Distillation
[article]
2019
arXiv
pre-print
We are able to do it by using a new technique named Prioritized Oversampled Experience Replay (POER), that has been built upon the definition of what is the important experience useful to replay. ...
More in detail, we show how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results ...
University of Bologna, Department of Computer Science and Engineering (DISI), Mura Anteo Zamboni 7, 40127, Bologna, Italy 1 maybe due to the insufficient amount of training time, or due to the adopted replay ...
arXiv:1905.07579v1
fatcat:m7js5xekwzhyfkl5lmey3augh4
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
[article]
2020
arXiv
pre-print
For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. ...
Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. ...
Prioritized experience replay was introduced by Schaul et al. (2016) as an improve-ment to the experience replay in DQN (Mnih et al., 2015) . ...
arXiv:1905.08786v3
fatcat:tw5xs4j4rfbm7fkdz7mhjlqppy
« Previous
Showing results 1 — 15 out of 6,299 results