6,299 Hits in 2.3 sec

Distributed Prioritized Experience Replay [article]

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver
2018 arXiv   pre-print
The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors.  ...  in a shared experience replay memory; the learner replays samples of experience and updates the neural network.  ...  OUR CONTRIBUTION: DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY In this paper we extend prioritized experience replay to the distributed setting and show that this is a highly scalable approach to deep reinforcement  ... 
arXiv:1803.00933v1 fatcat:tmjsfjvpmjewfc5na53hur2wve

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Haiyan Yin, Sinno Pan
Furthermore, we propose a new sampling framework termed hierarchical prioritized experience replay to selectively choose experiences from the replay memories of each task domain to perform learning on  ...  Second, we propose hierarchical prioritized experience replay to enhance the benefit of prioritization by regularizing the distribution of the sampled experiences from each domain.  ...  However, prioritized replay introduces distribution bias to the sampled experiences, which means that the original state distribution cannot be preserved.  ... 
doi:10.1609/aaai.v31i1.10733 fatcat:znbzsfifmvfqzega4d444oyzia

Experience Replay with Likelihood-free Importance Weights [article]

Samarth Sinha and Jiaming Song and Animesh Garg and Stefano Ermon
2020 arXiv   pre-print
We use a likelihood-free density ratio estimator over the replay buffer to assign the prioritization weights.  ...  distribution of the current policy.  ...  Prioritized Experience Replay based on Stationary Distributions Assume that d, the distribution the replay buffer D is sampled from, is supported on the entire space S × A, and that we have infinite samples  ... 
arXiv:2006.13169v1 fatcat:vbjy5zdezvgrfdbxnuyi72l5fe

Prioritized Experience Replay [article]

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
2016 arXiv   pre-print
DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.  ...  In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently.  ...  PRIORITIZED REPLAY Using a replay memory leads to design choices at two levels: which experiences to store, and which experiences to replay (and how to do so).  ... 
arXiv:1511.05952v4 fatcat:mcttbjzpsvhhrkcupyt2cksqai

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay [article]

Dogan C. Cicek, Enes Duran, Baturay Saglam, Furkan B. Mutlu, Suleyman S. Kozat
2021 arXiv   pre-print
In this paper, we develop a novel algorithm, Batch Prioritizing Experience Replay via KL Divergence (KLPER), which prioritizes batch of transitions rather than directly prioritizing each transition.  ...  Therefore, experience replay prioritization algorithms recalculate the significance of a transition when the corresponding transition is sampled to gain computational efficiency.  ...  In this paper, we introduce a novel experience replay prioritization method, Batch Prioritized Experience Replay via KL Divergence, KLPER.  ... 
arXiv:2111.01865v2 fatcat:cndqpix3bbhuniy5aidurpjd6q

Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization [article]

Binyamin Manela, Armin Biess
2021 arXiv   pre-print
Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions.  ...  First, we prioritize virtual goals from which the agent will learn more valuable information.  ...  In all algorithms we used prioritized experience replay (PER) [16] .  ... 
arXiv:1905.05498v5 fatcat:2asfx7vb5bhrdntwspwzkyk4me

Curiosity-Driven Experience Prioritization via Density Estimation [article]

Rui Zhao, Volker Tresp
2020 arXiv   pre-print
In our experiments, we combined CDP with Deep Deterministic Policy Gradient (DDPG) with or without Hindsight Experience Replay (HER).  ...  To address this problem, we propose a novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states.  ...  Experience Replay To the best our knowledge, the most similar method to CDP is Prioritized Experience Replay (PER) [41] .  ... 
arXiv:1902.08039v3 fatcat:vuvhi5evz5e4tcmpvseijifkhy

A DPDK-Based Acceleration Method for Experience Sampling of Distributed Reinforcement Learning [article]

Masaki Furukawa, Hiroki Matsutani
2022 arXiv   pre-print
latencies for prioritized experience sampling by 21.9% to 29.1%.  ...  As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication  ...  Ape-X introduces a prioritized experience replay for large-scale distributed reinforcement learning systems that consist of Actor processes, experience replay memory, and Learner process.  ... 
arXiv:2110.13506v2 fatcat:s34wxpnohjdypmcokxfroi22mi

Advances in Experience Replay [article]

Tracy Wan, Neil Xu
2018 arXiv   pre-print
This project combines recent advances in experience replay techniques, namely, Combined Experience Replay (CER), Prioritized Experience Replay (PER), and Hindsight Experience Replay (HER).  ...  CER always adds the most recent experience to the batch. PER chooses which experiences should be replayed based on how beneficial they will be towards learning.  ...  Prioritized Experience Replay (PER) In regular experience replay, all transitions are sampled uniformly.  ... 
arXiv:1805.05536v1 fatcat:dtdfqfgbznfnbaosp3e2jfthbi

Double Prioritized State Recycled Experience Replay [article]

Fanchen Bu, Dong Eui Chang
2020 arXiv   pre-print
A prior work called prioritized experience replay was developed where experiences are prioritized, so as to replay experiences seeming to be more important more frequently.  ...  In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experiences in both training stage and storing stage, as well as replacing the experiences  ...  palliate the non-stationary distribution problem.  ... 
arXiv:2007.03961v3 fatcat:ze7h7yqtmvdnvldatbfiqdgbki

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Tung M. Luu, Chang D. Yoo
2021 IEEE Access  
This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight  ...  The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization  ...  Also, in [41] , the extension of Prioritized Experience Replay (PER) tries to prioritize experiences instead.  ... 
doi:10.1109/access.2021.3069975 fatcat:fgfnq5afybgr7cpv7f67w3r3wy

Prioritized Level Replay [article]

Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
2021 arXiv   pre-print
We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future  ...  The following sections describes how Prioritized Level Replay updates the replay distribution P replay (l|Λ seen ), namely through level scoring and staleness-aware prioritization.  ...  (d) to determine whether to replay a level sampled from the replay distribution P replay (l|Λ seen ) or to experience a new, unseen level from Λ train , according to some distribution P new (l|Λ train  ... 
arXiv:2010.03934v4 fatcat:a4urejqeirf27adktdez5p6w6q

One Pass ImageNet [article]

Huiyi Hu, Ang Li, Daniele Calandriello, Dilan Gorur
2021 arXiv   pre-print
The idea of prioritized experience replay [15] is to add a priority score to each example and sample from the buffer according to the probability distribution normalized from the priority scores.  ...  Let p(x) be the distribution of the original data and q(x) be the distribution in the replay buffer. The original objective is E p [ℓ(x; θ)].  ... 
arXiv:2111.01956v1 fatcat:o3mnbg6t6vdhtbwwkdyx5z3gbq

Combining Experience Replay with Exploration by Random Network Distillation [article]

Francesco Sovrano
2019 arXiv   pre-print
We are able to do it by using a new technique named Prioritized Oversampled Experience Replay (POER), that has been built upon the definition of what is the important experience useful to replay.  ...  More in detail, we show how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results  ...  University of Bologna, Department of Computer Science and Engineering (DISI), Mura Anteo Zamboni 7, 40127, Bologna, Italy 1 maybe due to the insufficient amount of training time, or due to the adopted replay  ... 
arXiv:1905.07579v1 fatcat:m7js5xekwzhyfkl5lmey3augh4

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning [article]

Rui Zhao, Xudong Sun, Volker Tresp
2020 arXiv   pre-print
For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay.  ...  Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective.  ...  Prioritized experience replay was introduced by Schaul et al. (2016) as an improve-ment to the experience replay in DQN (Mnih et al., 2015) .  ... 
arXiv:1905.08786v3 fatcat:tw5xs4j4rfbm7fkdz7mhjlqppy
« Previous Showing results 1 — 15 out of 6,299 results