3,710 Hits in 6.6 sec

A Deep Q-Learning Agent for the L-Game with Variable Batch Training [article]

Petros Giannakopoulos, Yannis Cotronis
2018 arXiv   pre-print
We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while self-learning from low-dimensional states.  ...  We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training.  ...  Conclusion In this paper we developed a game playing agent based on Deep Q-Learning for a challenging board game.  ... 
arXiv:1802.06225v1 fatcat:v672wwcqwnau7lsun2ua7d63pm

Stochastic Variance Reduction for Deep Q-learning [article]

Wei-Ye Zhao, Xi-Ya Guan, Yang Liu, Xiaoming Zhao, Jian Peng
2019 arXiv   pre-print
With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games.  ...  Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications.  ...  for deep Q-learning by reducing the AGE variance.  ... 
arXiv:1905.08152v1 fatcat:dvmbg6ly5vaenl3badbojsm7xq

An Optimistic Perspective on Offline Reinforcement Learning [article]

Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
2020 arXiv   pre-print
This paper studies offline RL using the DQN replay dataset comprising the entire replay experience of a DQN agent on 60 Atari 2600 games.  ...  We demonstrate that recent off-policy deep RL algorithms, even when trained solely on this fixed dataset, outperform the fully trained DQN agent.  ...  Acknowledgements We thank Pablo Samuel Castro for help in understanding and debugging issues with the Dopamine codebase and reviewing an early draft of the paper.  ... 
arXiv:1907.04543v4 fatcat:ocqec67o7zhvtlsz4sjlwvoa7e

Randomized Value Functions via Multiplicative Normalizing Flows [article]

Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent
2019 arXiv   pre-print
In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized  ...  This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods.  ...  We train each agent with ten different random seeds for each chain length.  ... 
arXiv:1806.02315v3 fatcat:6nbtlhjvo5e4pbzojhvudaroje

Combo-Action: Training Agent For FPS Game with Auxiliary Tasks

Shiyu Huang, Hang Su, Jun Zhu, Ting Chen
We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions.  ...  Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games.  ...  We also present some work related to our method and the efforts made in the FPS game AI research field. Deep Q-learning Deep Q-learning can learn a policy by interacting with the environment.  ... 
doi:10.1609/aaai.v33i01.3301954 fatcat:mez35pzfg5gmzokknbygvpu5pe

Quasi-Newton Optimization Methods For Deep Learning Applications [article]

Jacob Rafati, Roummel F. Marcia
2019 arXiv   pre-print
Our results show a robust convergence with preferred generalization characteristics as well as fast training time.  ...  Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like  ...  We used DeepMind's Deep Q-Network (DQN) architecture, described in [37] , as a function approximator for Q(s, a; w). The same architecture was used to train agents to play the different ATARI games.  ... 
arXiv:1909.01994v1 fatcat:2ctrl5kfizelpbpa3f5t4h5vuu

Deep Reinforcement Learning with Weighted Q-Learning [article]

Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi
2020 arXiv   pre-print
Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning.  ...  In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective  ...  Estimation biases in Q-Learning Choosing a target value for the Q-Learning update rule can be seen as an instance of the Maximum Expected Value (MEV) estimation problem for a set of random variables, here  ... 
arXiv:2003.09280v2 fatcat:mhybvtbsofelxmd2npfrvql354

Distributed Deep Q-Learning [article]

Hao Yi Ong, Kevin Chavez, Augustus Hong
2015 arXiv   pre-print
The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning.  ...  We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to achieve reasonable success on a simple game with minimal parameter tuning.  ...  A. Data parallelism The serial Deep Q-learning algorithm uses stochastic gradient descent to train the Q network.  ... 
arXiv:1508.04186v2 fatcat:xpwce2w2xnafthaeffdyjktmeu

Accelerated Methods for Deep Reinforcement Learning [article]

Adam Stooke, Pieter Abbeel
2019 arXiv   pre-print
We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs.  ...  Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice.  ...  ACKNOWLEDGEMENTS Adam Stooke gratefully acknowledges the support of the Fannie & John Hertz Foundation. The DGX-1 used for this research was donated by the NVIDIA Corporation.  ... 
arXiv:1803.02811v2 fatcat:uz7reunzjzblhgl2z7boporqq4

Deep Q-learning from Demonstrations [article]

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou (+2 others)
2017 arXiv   pre-print
This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment.  ...  We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it  ...  , Jon Scholz, David Silver, Toby Pohlen, Tom Stepleton, Ziyu Wang, and many others at DeepMind for insightful discussions, code contributions, and other efforts.  ... 
arXiv:1704.03732v4 fatcat:aojkn6wbozc6xlcfdfylqsfr6y

ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations [article]

Daniel Seita, David Chan, Roshan Rao, Chen Tang, Mandi Zhao, John Canny
2019 arXiv   pre-print
Prior work, such as the popular Deep Q-learning from Demonstrations (DQfD) algorithm has generally focused on single demonstrators.  ...  Learning from demonstrations is a popular tool for accelerating and reducing the exploration requirements of reinforcement learning.  ...  A prominent example of this for discrete control is Deep Q-learning from Demonstrations (DQfD) [11] , the most relevant prior work, which seeded a learner agent with a small batch of human demonstrator  ... 
arXiv:1910.12154v1 fatcat:7anddutaa5h2daie6yadypkzmi

Baselines for Reinforcement Learning in Text Games [article]

Mikuláš Zelinka
2018 arXiv   pre-print
We also present pyfiction, an open-source library for universal access to different text games that could, together with our agent that implements its interface, serve as a baseline for future research  ...  Text-based games with multiple endings and rewards are a promising platform for this task, since their feedback allows us to employ reinforcement learning techniques to jointly learn text representations  ...  Deep Reinforcement Learning For finding the optimal policy in the text-game MDP, we employ Deep Reinforcement Learning (DRL) [7] .  ... 
arXiv:1811.02872v1 fatcat:ya2pp4tuwffffhaxkmvvgftn6y

Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks [article]

Felix Leibfried, Peter Vrancx
2018 arXiv   pre-print
We extend conventional Deep Q-Networks (DQNs) by adding a model-learning component yielding a transcoder network.  ...  This paper proposes a new optimization objective for value-based deep reinforcement learning.  ...  Deep Q-networks [3] learn a deep neural network based Q-value approximation by performing stochastic gradient descent on the following training objective: L (Q) (θ) = E S,a,r,f,S r + (1 − f )γ max a  ... 
arXiv:1809.01906v2 fatcat:2n7mubkcdfcyrebnku5nbsh2h4

Language Understanding for Text-based Games Using Deep Reinforcement Learning [article]

Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay
2015 arXiv   pre-print
We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback.  ...  In this paper, we consider the task of learning control policies for text-based games. In these games, all interactions in the virtual world are through text and the underlying state is not observed.  ...  Acknowledgements We are grateful to the developers of Evennia, the game framework upon which this work is based.  ... 
arXiv:1506.08941v2 fatcat:q4rbngfyuna2xn2xjwtq57yl2a

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning [article]

Xiangxiang Chu, Hangjun Ye
2017 arXiv   pre-print
Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently.  ...  Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents.  ...  By combining Qlearning with deep neural network as the value function, a deep Q-network called DQN was put forward to play human-level games just depending on the pixel input, which boosted reinforcement  ... 
arXiv:1710.00336v2 fatcat:wsq3yokbpzgwvhntedmymcgg5i
« Previous Showing results 1 — 15 out of 3,710 results