73,636 Hits in 9.7 sec

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning [article]

Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
2017 arXiv   pre-print
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs  ...  In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep  ...  We gratefully acknowledge the support of the NSF through grant IIS-1619362 and of the ARC through a Laureate Fellowship (FL110100281) and through the ARC Centre of Excellence for Mathematical and Statistical  ... 
arXiv:1611.04717v3 fatcat:pt5t26wxhrcc7kl3ieo3jcvkgu

Exploration for Countering the Episodic Memory

Rong Zhou, Yuan Wang, Xiwen Zhang, Chao Wang, Ahmed Mostafa Khalil
2022 Computational Intelligence and Neuroscience  
In low-dimensional Markov decision processes, table reinforcement learning incorporated within count-based exploration works well for states of the Markov decision processes that can be easily exhausted  ...  Reinforcement learning is a prominent computational approach for goal-directed learning and decision making, and exploration plays an important role in improving the agent's performance in reinforcement  ...  Count-Based Exploration and Episodic Memory In low-dimensional Markov decision processes, table reinforcement learning incorporated within count-based exploration works well for states of the Markov decision  ... 
doi:10.1155/2022/7286186 pmid:35419049 pmcid:PMC8995543 fatcat:3ocvtntowrh2pp6h5uutmpqnfq

BeBold: Exploration Beyond the Boundary of Explored Regions [article]

Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian
2020 arXiv   pre-print
Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR).  ...  The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.  ...  ACKNOWLEDGEMENTS This project occurred under the BAIR Commons at UC-Berkeley and we thanks Commons sponsors for their support.  ... 
arXiv:2012.08621v1 fatcat:fyq73b45rzgeva5mhr5yyyv43u

UCB Exploration via Q-Ensembles [article]

Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman
2017 arXiv   pre-print
We show how an ensemble of Q^*-functions can be leveraged for more effective exploration in deep reinforcement learning.  ...  We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB).  ...  EX2: Exploration with exemplar models for deep reinforcement learning. arXiv preprint arXiv:1703.01260, 2017.  ... 
arXiv:1706.01502v3 fatcat:v3ury7x35zcntiiij4niyrcebi

Offline Reinforcement Learning as Anti-Exploration [article]

Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist
2021 arXiv   pre-print
The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset.  ...  Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system.  ...  Foote, A. Stooke, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning.  ... 
arXiv:2106.06431v1 fatcat:crppp6covnc7plgf6uttkwyili

An Exploration of Embodied Visual Exploration [article]

Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman
2020 arXiv   pre-print
We then perform a thorough empirical study of the four state-of-the-art paradigms using the proposed framework with two photorealistic simulated 3D environments, a state-of-the-art exploration architecture  ...  Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment?  ...  Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines and the GCP Research Credits Program.  ... 
arXiv:2001.02192v2 fatcat:oha4n2wsfrhknamznly2zyu7vy

MADE: Exploration via Maximizing Deviation from Explored Regions [article]

Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell
2021 arXiv   pre-print
As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.  ...  In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards.  ...  Acknowledgements The authors are grateful to Andrea Zanette for helpful discussions. The authors thank Alekh Agarwal, Michael Henaff, Sham Kakade, and Wen Sun for providing their code.  ... 
arXiv:2106.10268v1 fatcat:aidauhhgcfczvhgapa2bhbih74

Exploration in deep reinforcement learning: A survey

Pawel Ladosz, Lilian Weng, Minwoo Kim, Hyondong Oh
2022 Information Fusion  
This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems.  ...  In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised.  ...  Note that this review is intended for beginners in exploration for deep reinforcement learning; thus, the focus is on the breadth of approaches and their relatively simplified description.  ... 
doi:10.1016/j.inffus.2022.03.003 fatcat:q4uwqd26qjfyzivtyzqhf7u5cm

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.  ...  Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  In the next section, we propose a method of categorization for exploration techniques in reinforcement learning.  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i

Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning [article]

William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller
2021 arXiv   pre-print
Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of  ...  We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime.  ...  Acknowledgements We thank many people for valuable conversations early in this project, in particular Nicolas Heess, Pablo Sprechmann, Michael Neunert, Jonas Degrave, Jan Humplik, David Abel, Alessandro  ... 
arXiv:2101.09458v2 fatcat:6tv26wgtnbgwvcpf7y5uaotu3u

Designing Deep Reinforcement Learning for Human Parameter Exploration

Hugo Scurto, Bavo Van Kerrebroeck, Baptiste Caramiaux, Frédéric Bevilacqua
2021 ACM Transactions on Computer-Human Interaction  
In this article, we propose to investigate artificial agents using deep reinforcement learning to explore parameter spaces in partnership with users for sound design.  ...  We describe a series of user-centred studies to probe the creative benefits of these agents and adapting their design to exploration.  ...  We thank Benjamin Matuszewski, Jean-Philippe Lambert, and Adèle Pécout for their support in designing the studies.  ... 
doi:10.1145/3414472 fatcat:owxyc3nkojhtjczoetoa7pttdu

Parameterized Exploration [article]

Jesse Clifton, Lili Wu, Eric Laber
2019 arXiv   pre-print
a Markov decision process based on a mobile health (mHealth) study.  ...  We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems.  ...  ., 2017) , in which exploration in the deep reinforcement learning setting is induced by adding parameterized noise to the weights of the neural network, and tuning the parameters governing the exploration  ... 
arXiv:1907.06090v1 fatcat:jfgnkyurpfb3dgg2ewmakpkmqu

Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning [article]

Ramtin Keramati, Jay Whang, Patrick Cho, Emma Brunskill
2018 arXiv   pre-print
We illustrate the benefit of these ideas by introducing a novel algorithm, Strategic Object Oriented Reinforcement Learning (SOORL), that outperforms state-of-the-art algorithms in the game of Pitfall!  ...  Inspired by this, we investigate two issues in leveraging model-based RL for sample efficiency.  ...  INTRODUCTION The coupling of deep neural networks and reinforcement learning has led to exciting advances, enabling reinforcement learning agents that can reach human-level performance in many Atari2600  ... 
arXiv:1806.00175v2 fatcat:f4cbjoacq5fwlorjwgvjes7mgi

Learning to Explore with Meta-Policy Gradient [article]

Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng
2018 arXiv   pre-print
With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks.  ...  The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.  ...  Acknowledgement We appreciate Kliegl Markus for his insightful discussions and helpful comments.  ... 
arXiv:1803.05044v2 fatcat:gxgkv6uljfbzjmfpwgzk6itof4

Count-Based Exploration with Neural Density Models [article]

Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
2017 arXiv   pre-print
Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning.  ...  This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge  ...  here, and Audrunas Gruslys especially for providing the Reactor agent.  ... 
arXiv:1703.01310v2 fatcat:cdp4czchnremdj7yazowht6eoi
« Previous Showing results 1 — 15 out of 73,636 results