2,298 Hits in 2.6 sec

Generalization and Regularization in DQN [article]

Jesse Farebrother, Marlos C. Machado, Michael Bowling
2020 arXiv   pre-print
Despite regularization being largely underutilized in deep reinforcement learning, we show that it can, in fact, help DQN learn more general features.  ...  We then comprehensively evaluate the impact of dropout and ℓ_2 regularization, as well as the impact of reusing learned representations to improve the generalization capabilities of DQN.  ...  Taylor, Tom van de Wiele, and Marc G. Bellemare for useful discussions, as well as Vlad Mnih for feedback on a preliminary draft of the manuscript.  ... 
arXiv:1810.00123v3 fatcat:fatmy5auovfcximggvj3hrvlgq

Shallow Updates for Deep Reinforcement Learning [article]

Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
2017 arXiv   pre-print
We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method.  ...  Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains.  ...  Regularization: The general idea of applying regularization for feature selection, and to avoid overfitting is a common theme in machine learning.  ... 
arXiv:1705.07461v2 fatcat:vru4fauuaraw5g5edn6gkh4r2y

Deep Reinforcement Learning with Decorrelation [article]

Borislav Mavrin, Hengshuai Yao, Linglong Kong
2019 arXiv   pre-print
Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.  ...  In particular, ours performs better than DQN on 39 games with 4 close ties and lost only slightly on 6 games.  ...  Tile coding is a classical binary scheme for encoding states and generalizes in local sub-spaces (Albus, 1975) .  ... 
arXiv:1903.07765v3 fatcat:ohfgejpvqzdsbauzqj4yr56n3m

Munchausen Reinforcement Learning [article]

Nino Vieillard, Olivier Pietquin, Matthieu Geist
2020 arXiv   pre-print
To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.  ...  We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns  ...  M-DQN is largely over DQN, and outperforms C51 both in mean and median.  ... 
arXiv:2007.14430v3 fatcat:cc6dnpzn4jddfarby74xxc2s6a

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning [article]

Hirohisa Watanabe, Mineto Tsukada, Hiroki Matsutani
2021 arXiv   pre-print
In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range  ...  DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices  ...  In this case, a generalization capability is improved by the L2 regularization and an output range is limited by the spectral normalization.  ... 
arXiv:2005.04646v3 fatcat:yklbzntnnzfvfoqo4obugr4mci

Episodic Memory Deep Q-Networks [article]

Zichuan Lin, Tianqi Zhao, Guangwen Yang, Lintao Zhang
2018 arXiv   pre-print
It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.  ...  Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN).  ...  In general RL problems, researchers prefer to use parametric methods (e.g. DQN, A3C [Mnih et al., 2016] ) due to their good ability in generalizing to novel states in stochastic environment.  ... 
arXiv:1805.07603v1 fatcat:ti2fcmw6nnetxb6yispsdiqfzu

Episodic Memory Deep Q-Networks

Zichuan Lin, Tianqi Zhao, Guangwen Yang, Lintao Zhang
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.  ...  Experiments show that our proposed method leads to better sample efficiency and is more likely to find good policy.  ...  In general RL problems, researchers prefer to use parametric methods (e.g. DQN, A3C [Mnih et al., 2016] ) due to their good ability in generalizing to novel states in stochastic environment.  ... 
doi:10.24963/ijcai.2018/337 dblp:conf/ijcai/LinZYZ18 fatcat:rnpz4237g5bo7eob7ffccl52xy

Sparse Bayesian Learning with Diagonal Quasi-Newton Method for Large Scale Classification [article]

Jiahua Luo, Chi-Man Vong, Jie Du
2021 arXiv   pre-print
Experimental results verify that DQN-SBL receives competitive generalization with a very sparse model and scales well to large-scale problems.  ...  This paper addresses these issues with a newly proposed diagonal Quasi-Newton (DQN) method for SBL called DQN-SBL where the inversion of big covariance matrix is ignored so that the complexity and memory  ...  The accuracy curves for DQN-RVM and RVM are very close in most of benchmarks under different 𝜎, except with small fluctuation in colon-2000, diabetes and madelon.  ... 
arXiv:2107.08195v2 fatcat:r2f47usnc5emdg7twfmraabriy

Learning Sparse Representations Incrementally in Deep Reinforcement Learning [article]

J. Fernando Hernandez-Garcia, Richard S. Sutton
2019 arXiv   pre-print
We investigate this question by employing several regularization techniques and observing how they affect sparsity of the representation learned by a DQN agent in two different benchmark domains.  ...  Our results show that with appropriate regularization it is possible to increase the sparsity of the representations learned by DQN agents.  ...  All the regularization methods used the same buffer size and target network update frequency as DQN.  ... 
arXiv:1912.04002v1 fatcat:2es7yx5uzngg3hycuhjqhdvlby

Count-Based Exploration with Neural Density Models [article]

Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
2017 arXiv   pre-print
This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge  ...  The result is a more practical and general algorithm requiring no special apparatus.  ...  here, and Audrunas Gruslys especially for providing the Reactor agent.  ... 
arXiv:1703.01310v2 fatcat:cdp4czchnremdj7yazowht6eoi

Multi-focus Attention Network for Efficient Deep Reinforcement Learning [article]

Jinyoung Choi, Beom-Jin Lee, Byoung-Tak Zhang
2017 arXiv   pre-print
In this paper, we propose a Multi-focus Attention Network (MANet) which mimics human ability to spatially abstract the low-level sensory input into multiple entities and attend to them simultaneously.  ...  In our experiments, MANet attains highest scores with significantly less experience samples.  ...  Acknowledgments This research was supported by a grant to Bio-Mimetic Robot Research Center Funded by Defense Acquisition Program Administration, and by Agency for Defense Development (UD130070ID).  ... 
arXiv:1712.04603v1 fatcat:7epcswht4ree7bkki75z3oy2wy

On the Reduction of Variance and Overestimation of Deep Q-Learning [article]

Mohammed Sabry, Amr M. A. Khalifa
2019 arXiv   pre-print
We further present experiments on some of the benchmark environments that demonstrate significant improvement of the stability of the performance and a reduction in variance and overestimation.  ...  In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation.  ...  The maximization of the action space in Q-learning algorithm and the generalization errors in neural networks can lead to overestimation and variance in of state-action values.  ... 
arXiv:1910.05983v1 fatcat:x6avbd6wfncu3oro6sultvomqm

Effective Exploration for Deep Reinforcement Learning via Bootstrapped Q-Ensembles under Tsallis Entropy Regularization [article]

Gang Chen and Yiming Peng and Mengjie Zhang
2018 arXiv   pre-print
Specifically, a general form of Tsallis entropy regularizer will be utilized to drive entropy-induced exploration based on efficient approximation of optimal action-selection policies.  ...  With the aim of improving sample efficiency and learning performance, we will develop a new DRL algorithm in this paper that seamless integrates entropy-induced and bootstrap-induced techniques for efficient  ...  Conclusions In this paper we studied entropy-induced environment exploration via deep Q-learning under general Tsallis entropy regularization.  ... 
arXiv:1809.00403v2 fatcat:kgl7sktmpbgghfns4g6ewfrtdy

Policy Distillation [article]

Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell
2016 arXiv   pre-print
In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while  ...  We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.  ...  A detailed results table is given in Appendix B These results suggest that DQN could benefit from a reduced capacity model or regularization.  ... 
arXiv:1511.06295v2 fatcat:2jnp5mpncjhato6p53yqeva2ma

Deep Q-learning from Demonstrations [article]

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou (+2 others)
2017 arXiv   pre-print
We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it  ...  takes PDD DQN 83 million steps to catch up to DQfD's performance.  ...  many others at DeepMind for insightful discussions, code contributions, and other efforts.  ... 
arXiv:1704.03732v4 fatcat:aojkn6wbozc6xlcfdfylqsfr6y
« Previous Showing results 1 — 15 out of 2,298 results