A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
[article]
2017
arXiv
pre-print
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs ...
In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep ...
We gratefully acknowledge the support of the NSF through grant IIS-1619362 and of the ARC through a Laureate Fellowship (FL110100281) and through the ARC Centre of Excellence for Mathematical and Statistical ...
arXiv:1611.04717v3
fatcat:pt5t26wxhrcc7kl3ieo3jcvkgu
Exploration for Countering the Episodic Memory
2022
Computational Intelligence and Neuroscience
In low-dimensional Markov decision processes, table reinforcement learning incorporated within count-based exploration works well for states of the Markov decision processes that can be easily exhausted ...
Reinforcement learning is a prominent computational approach for goal-directed learning and decision making, and exploration plays an important role in improving the agent's performance in reinforcement ...
Count-Based Exploration and Episodic Memory In low-dimensional Markov decision processes, table reinforcement learning incorporated within count-based exploration works well for states of the Markov decision ...
doi:10.1155/2022/7286186
pmid:35419049
pmcid:PMC8995543
fatcat:3ocvtntowrh2pp6h5uutmpqnfq
BeBold: Exploration Beyond the Boundary of Explored Regions
[article]
2020
arXiv
pre-print
Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR). ...
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. ...
ACKNOWLEDGEMENTS This project occurred under the BAIR Commons at UC-Berkeley and we thanks Commons sponsors for their support. ...
arXiv:2012.08621v1
fatcat:fyq73b45rzgeva5mhr5yyyv43u
UCB Exploration via Q-Ensembles
[article]
2017
arXiv
pre-print
We show how an ensemble of Q^*-functions can be leveraged for more effective exploration in deep reinforcement learning. ...
We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). ...
EX2: Exploration with exemplar models for
deep reinforcement learning. arXiv preprint arXiv:1703.01260, 2017. ...
arXiv:1706.01502v3
fatcat:v3ury7x35zcntiiij4niyrcebi
Offline Reinforcement Learning as Anti-Exploration
[article]
2021
arXiv
pre-print
The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. ...
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. ...
Foote, A. Stooke, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel.
# exploration: A study of count-based exploration for deep reinforcement learning. ...
arXiv:2106.06431v1
fatcat:crppp6covnc7plgf6uttkwyili
An Exploration of Embodied Visual Exploration
[article]
2020
arXiv
pre-print
We then perform a thorough empirical study of the four state-of-the-art paradigms using the proposed framework with two photorealistic simulated 3D environments, a state-of-the-art exploration architecture ...
Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? ...
Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines and the GCP Research Credits Program. ...
arXiv:2001.02192v2
fatcat:oha4n2wsfrhknamznly2zyu7vy
MADE: Exploration via Maximizing Deviation from Explored Regions
[article]
2021
arXiv
pre-print
As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. ...
In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. ...
Acknowledgements The authors are grateful to Andrea Zanette for helpful discussions. The authors thank Alekh Agarwal, Michael Henaff, Sham Kakade, and Wen Sun for providing their code. ...
arXiv:2106.10268v1
fatcat:aidauhhgcfczvhgapa2bhbih74
Exploration in deep reinforcement learning: A survey
2022
Information Fusion
This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. ...
In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. ...
Note that this review is intended for beginners in exploration for deep reinforcement learning; thus, the focus is on the breadth of approaches and their relatively simplified description. ...
doi:10.1016/j.inffus.2022.03.003
fatcat:q4uwqd26qjfyzivtyzqhf7u5cm
A Survey of Exploration Methods in Reinforcement Learning
[article]
2021
arXiv
pre-print
In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods. ...
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. ...
In the next section, we propose a method of categorization for exploration techniques in reinforcement learning. ...
arXiv:2109.00157v2
fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i
Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning
[article]
2021
arXiv
pre-print
Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of ...
We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. ...
Acknowledgements We thank many people for valuable conversations early in this project, in particular Nicolas Heess, Pablo Sprechmann, Michael Neunert, Jonas Degrave, Jan Humplik, David Abel, Alessandro ...
arXiv:2101.09458v2
fatcat:6tv26wgtnbgwvcpf7y5uaotu3u
Designing Deep Reinforcement Learning for Human Parameter Exploration
2021
ACM Transactions on Computer-Human Interaction
In this article, we propose to investigate artificial agents using deep reinforcement learning to explore parameter spaces in partnership with users for sound design. ...
We describe a series of user-centred studies to probe the creative benefits of these agents and adapting their design to exploration. ...
We thank Benjamin Matuszewski, Jean-Philippe Lambert, and Adèle Pécout for their support in designing the studies. ...
doi:10.1145/3414472
fatcat:owxyc3nkojhtjczoetoa7pttdu
Parameterized Exploration
[article]
2019
arXiv
pre-print
a Markov decision process based on a mobile health (mHealth) study. ...
We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. ...
., 2017) , in which exploration in the deep reinforcement learning setting is induced by adding parameterized noise to the weights of the neural network, and tuning the parameters governing the exploration ...
arXiv:1907.06090v1
fatcat:jfgnkyurpfb3dgg2ewmakpkmqu
Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning
[article]
2018
arXiv
pre-print
We illustrate the benefit of these ideas by introducing a novel algorithm, Strategic Object Oriented Reinforcement Learning (SOORL), that outperforms state-of-the-art algorithms in the game of Pitfall! ...
Inspired by this, we investigate two issues in leveraging model-based RL for sample efficiency. ...
INTRODUCTION The coupling of deep neural networks and reinforcement learning has led to exciting advances, enabling reinforcement learning agents that can reach human-level performance in many Atari2600 ...
arXiv:1806.00175v2
fatcat:f4cbjoacq5fwlorjwgvjes7mgi
Learning to Explore with Meta-Policy Gradient
[article]
2018
arXiv
pre-print
With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks. ...
The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. ...
Acknowledgement We appreciate Kliegl Markus for his insightful discussions and helpful comments. ...
arXiv:1803.05044v2
fatcat:gxgkv6uljfbzjmfpwgzk6itof4
Count-Based Exploration with Neural Density Models
[article]
2017
arXiv
pre-print
Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning. ...
This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge ...
here, and Audrunas Gruslys especially for providing the Reactor agent. ...
arXiv:1703.01310v2
fatcat:cdp4czchnremdj7yazowht6eoi
« Previous
Showing results 1 — 15 out of 73,636 results