433 Hits in 3.4 sec

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment [article]

Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
2021 arXiv   pre-print
We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than ϵ-greedy exploration.  ...  This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE).  ...  on an earlier draft of this paper.  ... 
arXiv:1908.02388v3 fatcat:5o52r4aqnrd4ljsiqxklv4t3ua

Self-Supervised Exploration via Latent Bayesian Surprise [article]

Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt
2021 arXiv   pre-print
Generating rewards in self-supervised way, by inspiring the agent with an intrinsic desire to learn and explore the environment, might induce more general behaviours.  ...  In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning, computed as the Bayesian surprise with respect to a latent state variable, learnt by reconstructing fixed  ...  ACKNOWLEDGMENTS This research received funding from the Flemish Government (AI Research Program).  ... 
arXiv:2104.07495v1 fatcat:uqrd7ueirvgv5fibas3nyjjqhe

Clustered Reinforcement Learning [article]

Xiao Ma, Shen-Yi Zhao, Wu-Jun Li
2019 arXiv   pre-print
Exploration strategy design is one of the challenging problems in reinforcement learning (RL), especially when the environment contains a large state space or sparse rewards.  ...  CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area (cluster) of the current state is given  ...  We escapes the valley from the right side and receives a reward of +0.001 at other positions. One snapshot of this task is shown in Figure 1 (a). Arcade Learning Environment.  ... 
arXiv:1906.02457v1 fatcat:k2dx4upchngxlfxub7zl6hlvea

Ddo, a Generic and Efficient Framework for MDD-Based Optimization

Xavier Gillard, Pierre Schaus, Vianney Coppé
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
As an additional benefit, our ddo library is able to exploit parallel computing for its purpose without imposing any constraint on the user (apart from memory safety).  ...  To the best of our knowledge, this is the first public implementation of a generic library to solve combinatorial optimization problems with branch-and-bound MDD.  ...  Experimental Results We evaluated the FaSo on two difficult exploration Atari 2600 games from the Arcade Learning Environment ( rewards are large enough to encourage the agent to discover and visit  ... 
doi:10.24963/ijcai.2020/733 dblp:conf/ijcai/BougieI20 fatcat:3txp4ns3lnarjao6zitybjy5ta

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models [article]

Bradly C. Stadie, Sergey Levine, Pieter Abbeel
2015 arXiv   pre-print
We evaluate several more sophisticated exploration strategies, including Thompson sampling and Boltzman exploration, and propose a new exploration method based on assigning exploration bonuses from a concurrently  ...  In addition to raw game-scores, we also develop an AUC-100 metric for the Atari Learning domain to evaluate the impact of exploration on this benchmark.  ...  The results for 14 games in the Arcade Learning Environment are presented in Table 1 .  ... 
arXiv:1507.00814v3 fatcat:tm24wkehojda5nq3zydrfphj7m

UCB Exploration via Q-Ensembles [article]

Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman
2017 arXiv   pre-print
We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB).  ...  We show how an ensemble of Q^*-functions can be leveraged for more effective exploration in deep reinforcement learning.  ...  6: We evaluate the algorithms on each Atari game of the Arcade Learning Environment (Bellemare et al.[4]).  ... 
arXiv:1706.01502v3 fatcat:v3ury7x35zcntiiij4niyrcebi

Exploration in Feature Space for Reinforcement Learning [article]

Suraj Narayanan Sasikumar
2017 arXiv   pre-print
The infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL).  ...  The resulting ϕ-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the original state space.  ...  Acknowledgments At the culmination of two years of hard work, I would like to take this opportunity to acknowledge the role they have played in my life.  ... 
arXiv:1710.02210v1 fatcat:7ddmu3kjdjd6hkoqjfdz375cii

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.  ...  Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  Prediction error-based bonus In this category of exploration methods, the bonus term is computed based on the change in the agent's knowledge about the environment dynamics.  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i

Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution [article]

Gyeong Taek Lee, Chang Ouk Kim
2019 arXiv   pre-print
In addition, by adding an intrinsic penalty reward to the state that the RL agent frequently visits and using replay memory for learning the feature state when using an exploration bonus, the proposed  ...  We verified the exploration performance of the algorithm through experiments in a two-dimensional grid environment.  ...  One way to represent the coordinates in the learning environment is to use a one-hot encoding vector.  ... 
arXiv:1901.05856v1 fatcat:qrq2zrpe4ndejeqxeotvmrcbcm

Self-Imitation Advantage Learning [article]

Johan Ferret, Olivier Pietquin, Matthieu Geist
2020 arXiv   pre-print
We demonstrate the empirical effectiveness of SAIL on the Arcade Learning Environment, with a focus on hard exploration games.  ...  It was shown to improve the performance of on-policy actor-critic methods in several discrete control tasks.  ...  the practicality of our method in terms of simplicity, efficiency and performance on the Arcade Learning Environment [ALE, 8] benchmark, under several base off-policy RL methods.  ... 
arXiv:2012.11989v1 fatcat:u3h7ugxgzzf6zapvmzivfabg5u

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning [article]

Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
2017 arXiv   pre-print
These counts are then used to compute a reward bonus according to the classic count-based exploration theory.  ...  Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs  ...  This research was funded in part by ONR through a PECASE award. Yan Duan was also supported by a Berkeley AI Research lab Fellowship and a Huawei Fellowship.  ... 
arXiv:1611.04717v3 fatcat:pt5t26wxhrcc7kl3ieo3jcvkgu

Count-Based Exploration with Neural Density Models [article]

Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
2017 arXiv   pre-print
Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning.  ...  This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge  ...  Acknowledgements The authors thank Tom Schaul, Olivier Pietquin, Ian Osband, Sriram Srinivasan, Tejas Kulkarni, Alex Graves, Charles Blundell, and Shimon Whiteson for invaluable feedback on the ideas presented  ... 
arXiv:1703.01310v2 fatcat:cdp4czchnremdj7yazowht6eoi

Unifying Count-Based Exploration and Intrinsic Motivation [article]

Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
2016 arXiv   pre-print
Specifically, we focus on the problem of exploration in non-tabular reinforcement learning.  ...  This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels.  ...  den Oord for their excellent feedback early and late in the writing, and Pierre-Yves Oudeyer and Yann Ollivier for pointing out additional connections to the literature.  ... 
arXiv:1606.01868v2 fatcat:5okxo4jxsngq3dundanacr3st4

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning [article]

Yuanyi Zhong, Alexander Schwing, Jian Peng
2020 arXiv   pre-print
In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the player's avatar in video games and the robotic arm in visual grasping and manipulation  ...  bonus reward).  ...  The training curves of all methods (DDQN baseline, DDQN+Pred and DDQN+Pred+Bonus) evaluated in Arcade Learning Environments [13] on the selected games are illustrated in Fig. 5 , where the horizontal  ... 
arXiv:2002.09136v1 fatcat:bw3iyw3gnfhtlldylcxxdwcb7e

Contingency-Aware Exploration in Reinforcement Learning [article]

Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee
2019 arXiv   pre-print
This paper investigates whether learning contingency-awareness and controllable aspects of an environment can lead to better exploration in reinforcement learning.  ...  To investigate this question, we consider an instantiation of this hypothesis evaluated on the Arcade Learning Element (ALE).  ...  Then, we add a count-based exploration bonus based on quantized observations.  ... 
arXiv:1811.01483v3 fatcat:2ock66anhfgehb7ihg667llfey
« Previous Showing results 1 — 15 out of 433 results