A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Deep Q-Learning Agent for the L-Game with Variable Batch Training
[article]
2018
arXiv
pre-print
We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while self-learning from low-dimensional states. ...
We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. ...
Conclusion In this paper we developed a game playing agent based on Deep Q-Learning for a challenging board game. ...
arXiv:1802.06225v1
fatcat:v672wwcqwnau7lsun2ua7d63pm
Stochastic Variance Reduction for Deep Q-learning
[article]
2019
arXiv
pre-print
With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games. ...
Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. ...
for deep Q-learning by reducing the AGE variance. ...
arXiv:1905.08152v1
fatcat:dvmbg6ly5vaenl3badbojsm7xq
An Optimistic Perspective on Offline Reinforcement Learning
[article]
2020
arXiv
pre-print
This paper studies offline RL using the DQN replay dataset comprising the entire replay experience of a DQN agent on 60 Atari 2600 games. ...
We demonstrate that recent off-policy deep RL algorithms, even when trained solely on this fixed dataset, outperform the fully trained DQN agent. ...
Acknowledgements We thank Pablo Samuel Castro for help in understanding and debugging issues with the Dopamine codebase and reviewing an early draft of the paper. ...
arXiv:1907.04543v4
fatcat:ocqec67o7zhvtlsz4sjlwvoa7e
Randomized Value Functions via Multiplicative Normalizing Flows
[article]
2019
arXiv
pre-print
In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized ...
This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods. ...
We train each agent with ten different random seeds for each chain length. ...
arXiv:1806.02315v3
fatcat:6nbtlhjvo5e4pbzojhvudaroje
Combo-Action: Training Agent For FPS Game with Auxiliary Tasks
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. ...
Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. ...
We also present some work related to our method and the efforts made in the FPS game AI research field.
Deep Q-learning Deep Q-learning can learn a policy by interacting with the environment. ...
doi:10.1609/aaai.v33i01.3301954
fatcat:mez35pzfg5gmzokknbygvpu5pe
Quasi-Newton Optimization Methods For Deep Learning Applications
[article]
2019
arXiv
pre-print
Our results show a robust convergence with preferred generalization characteristics as well as fast training time. ...
Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like ...
We used DeepMind's Deep Q-Network (DQN) architecture, described in [37] , as a function approximator for Q(s, a; w). The same architecture was used to train agents to play the different ATARI games. ...
arXiv:1909.01994v1
fatcat:2ctrl5kfizelpbpa3f5t4h5vuu
Deep Reinforcement Learning with Weighted Q-Learning
[article]
2020
arXiv
pre-print
Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. ...
In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective ...
Estimation biases in Q-Learning Choosing a target value for the Q-Learning update rule can be seen as an instance of the Maximum Expected Value (MEV) estimation problem for a set of random variables, here ...
arXiv:2003.09280v2
fatcat:mhybvtbsofelxmd2npfrvql354
Distributed Deep Q-Learning
[article]
2015
arXiv
pre-print
The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning. ...
We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to achieve reasonable success on a simple game with minimal parameter tuning. ...
A. Data parallelism The serial Deep Q-learning algorithm uses stochastic gradient descent to train the Q network. ...
arXiv:1508.04186v2
fatcat:xpwce2w2xnafthaeffdyjktmeu
Accelerated Methods for Deep Reinforcement Learning
[article]
2019
arXiv
pre-print
We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. ...
Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. ...
ACKNOWLEDGEMENTS Adam Stooke gratefully acknowledges the support of the Fannie & John Hertz Foundation. The DGX-1 used for this research was donated by the NVIDIA Corporation. ...
arXiv:1803.02811v2
fatcat:uz7reunzjzblhgl2z7boporqq4
Deep Q-learning from Demonstrations
[article]
2017
arXiv
pre-print
This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. ...
We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it ...
, Jon Scholz, David Silver, Toby Pohlen, Tom Stepleton, Ziyu Wang, and many others at DeepMind for insightful discussions, code contributions, and other efforts. ...
arXiv:1704.03732v4
fatcat:aojkn6wbozc6xlcfdfylqsfr6y
ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations
[article]
2019
arXiv
pre-print
Prior work, such as the popular Deep Q-learning from Demonstrations (DQfD) algorithm has generally focused on single demonstrators. ...
Learning from demonstrations is a popular tool for accelerating and reducing the exploration requirements of reinforcement learning. ...
A prominent example of this for discrete control is Deep Q-learning from Demonstrations (DQfD) [11] , the most relevant prior work, which seeded a learner agent with a small batch of human demonstrator ...
arXiv:1910.12154v1
fatcat:7anddutaa5h2daie6yadypkzmi
Baselines for Reinforcement Learning in Text Games
[article]
2018
arXiv
pre-print
We also present pyfiction, an open-source library for universal access to different text games that could, together with our agent that implements its interface, serve as a baseline for future research ...
Text-based games with multiple endings and rewards are a promising platform for this task, since their feedback allows us to employ reinforcement learning techniques to jointly learn text representations ...
Deep Reinforcement Learning For finding the optimal policy in the text-game MDP, we employ Deep Reinforcement Learning (DRL) [7] . ...
arXiv:1811.02872v1
fatcat:ya2pp4tuwffffhaxkmvvgftn6y
Model-Based Regularization for Deep Reinforcement Learning with Transcoder Networks
[article]
2018
arXiv
pre-print
We extend conventional Deep Q-Networks (DQNs) by adding a model-learning component yielding a transcoder network. ...
This paper proposes a new optimization objective for value-based deep reinforcement learning. ...
Deep Q-networks [3] learn a deep neural network based Q-value approximation by performing stochastic gradient descent on the following training objective: L (Q) (θ) = E S,a,r,f,S r + (1 − f )γ max a ...
arXiv:1809.01906v2
fatcat:2n7mubkcdfcyrebnku5nbsh2h4
Language Understanding for Text-based Games Using Deep Reinforcement Learning
[article]
2015
arXiv
pre-print
We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback. ...
In this paper, we consider the task of learning control policies for text-based games. In these games, all interactions in the virtual world are through text and the underlying state is not observed. ...
Acknowledgements We are grateful to the developers of Evennia, the game framework upon which this work is based. ...
arXiv:1506.08941v2
fatcat:q4rbngfyuna2xn2xjwtq57yl2a
Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning
[article]
2017
arXiv
pre-print
Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. ...
Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. ...
By combining Qlearning with deep neural network as the value function, a deep Q-network called DQN was put forward to play human-level games just depending on the pixel input, which boosted reinforcement ...
arXiv:1710.00336v2
fatcat:wsq3yokbpzgwvhntedmymcgg5i
« Previous
Showing results 1 — 15 out of 3,710 results