21,227 Hits in 3.0 sec

Domain-Independent Optimistic Initialization for Reinforcement Learning [article]

Marlos C. Machado, Sriram Srinivasan, Michael Bowling
2014 arXiv   pre-print
In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration.  ...  We present a simple approach that performs optimistic initialization with less dependence on the domain.  ...  This research was supported by Alberta Innovates Technology Futures and the Alberta Innovates Centre for Machine Learning and computing resources provided by Compute Canada through Westgrid.  ... 
arXiv:1410.4604v1 fatcat:lni7etwms5a3zlmizp3icaa654

Guiding Robot Exploration in Reinforcement Learning via Automated Planning [article]

Yohei Hayamizu, Saeid Amiri, Kishan Chandan, Keiki Takadama, Shiqi Zhang
2021 arXiv   pre-print
Reinforcement learning (RL) enables an agent to learn from trial-and-error experiences toward achieving long-term goals; automated planning aims to compute plans for accomplishing tasks using action knowledge  ...  The action knowledge is used for generating artificial experiences from an optimistic simulation.  ...  Optimistic Initialization The plans computed by the automated planner are referred to as optimistic plans, because real-world domain uncertainty is frequently overlooked in building the planners.  ... 
arXiv:2004.11456v2 fatcat:jehwm6urbvdcxiedrsnpgehdzm

Towards Finite-Sample Convergence of Direct Reinforcement Learning [chapter]

Shiau Hong Lim, Gerald DeJong
2005 Lecture Notes in Computer Science  
We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible.  ...  While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence.  ...  We believe there is a theoretical justification for the observed good behavior of direct reinforcement learning under optimistic initial Q-values and a small constant learning rate.  ... 
doi:10.1007/11564096_25 fatcat:nfo2wo4ex5gkdcwbjrzfgas4am

Balancing Value Underestimation and Overestimation with Realistic Actor-Critic [article]

Sicen Li, Gang Wang, Qinyun Tang, Liquan Wang
2021 arXiv   pre-print
Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains.  ...  With the guide of these critics, RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn many optimistic and pessimistic policies with the same neural network.  ...  Wang are with Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering ACKNOWLEDGMENT The authors are grateful to the Editor-in-Chief, the Associate Editor, and anonymous reviewers for  ... 
arXiv:2110.09712v3 fatcat:w3zyjue2zfae5le4wggx6g4y6q

Can good learners always compensate for poor learners?

Keith Sullivan, Liviu Panait, Gabriel Balan, Sean Luke
2006 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06  
Can a good learner compensate for a poor learner when paired in a coordination game?  ...  We give a straightforward extension to the coordination game in which FMQ cannot compensate for the lesser algorithm.  ...  LMRL is initially optimistic about a and especially about b, but if cannot recover from RL's lock onto c and it eventually settles on c itself.  ... 
doi:10.1145/1160633.1160777 dblp:conf/atal/SullivanPBL06 fatcat:ozlb5yo2zbedhbspaeknsajaze

Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning [article]

Xueguang Lyu, Christopher Amato
2020 arXiv   pre-print
Recently proposed deep multi-agent reinforcement learning methods have tried to mitigate this non-stationarity by attempting to determine which samples are from other agent exploration or suboptimality  ...  We also explore the effect of risk-seeking strategies for adjusting learning over time and propose adaptive risk distortion functions which guides risk sensitivity.  ...  CONCLUSION This paper describes a novel distributional RL method for improving performance in cooperative multi-agent reinforcement learning settings.  ... 
arXiv:1812.06319v6 fatcat:6lsfmhoww5ffjlkwvuwrw4z5je

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat
2012 Knowledge engineering review (Print)  
Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically.  ...  In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate.  ...  Conclusion This paper presented a comprehensive review of reinforcement learning algorithms for independent agents in cooperative multi-agent systems.  ... 
doi:10.1017/s0269888912000057 fatcat:j2unyb75c5a3lmpvdny3ex77ei

Off-policy reinforcement learning with Gaussian processes

2014 IEEE/CAA Journal of Automatica Sinica  
Reinforcement Learning Reinforcement learning is concerned with finding the optimal policy π * (s) = argmax a Q * (s, a) when P and R are unknown.  ...  Optimistic exploration is not as helpful in this domain since the pen- Figure 2 : Average sum of discounted rewards for the experimental domains.  ...  Note that m * For ease of exposition, we define the following notation: Similar to proofs of other online RL algorithms, including TD learning ( [42] and Q-learning [30] , an ODE approach is used to  ... 
doi:10.1109/jas.2014.7004680 fatcat:kha2ycelabczzmuwpnhckzxqym

Context-Dependent Upper-Confidence Bounds for Directed Exploration [article]

Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
2021 arXiv   pre-print
Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.  ...  Such context-dependent noise focuses exploration on a subset of variable states, and allows for reduced exploration in other states.  ...  For reinforcement learning, though, there are only specialized proofs for particular algorithms using optimistic estimates [8, 31] .  ... 
arXiv:1811.06629v2 fatcat:7way2gnc6rdixamvixaoqfo5ky

Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams

Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat
2007 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems  
The article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robots (IL) try to coordinate their individual behavior to reach a coherent joint  ...  We report an investigation of existing algorithms for the learning of coordination in cooperative MAS, and suggest a Q-Learning extension for ILs, called Hysteretic Q-Learning.  ...  Lauer & Riedmiller [12] introduced the Distributed Q-Learning algorithm. Optimistic independent agents neglect the penalties due to a non-coordination of agents in their update.  ... 
doi:10.1109/iros.2007.4399095 dblp:conf/iros/MatignonLF07 fatcat:dxhajv77b5e6zlnquch6wxjcba

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability [article]

Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
2017 arXiv   pre-print
This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability.  ...  Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to learn and store distinct policies for each task, but in practice  ...  Acknowledgements The authors thank the anonymous reviewers for their insightful feedback and suggestions.  ... 
arXiv:1703.06182v4 fatcat:pt76xj24snafziyymv4nnqkqsy

Why is Posterior Sampling Better than Optimism for Reinforcement Learning? [article]

Ian Osband, Benjamin Van Roy
2017 arXiv   pre-print
This improves upon the best previous bound of Õ(H S √(AT)) for any reinforcement learning algorithm.  ...  Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2.  ...  , anonymous reviewers for their helpful comments and many more colleagues at DeepMind including Remi Munos, Mohammad Azar and more for inspirational conversations.  ... 
arXiv:1607.00215v3 fatcat:rjsc3uwoabbqhbx3j4fjcpvqnu

Lenient Multi-Agent Deep Reinforcement Learning [article]

Gregory Palmer, Karl Tuyls, Daan Bloembergen, Rahul Savani
2018 arXiv   pre-print
However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11].  ...  This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems.  ...  Hysteretic Q-learning is a form of optimistic learning with a strong empirical track record in fully-observable multi-agent reinforcement learning [3, 20, 37] .  ... 
arXiv:1707.04402v2 fatcat:ozvqkfba7jf7vhpj7jg6imtsci

Regret Bounds for Reinforcement Learning with Policy Advice [article]

Mohammad Gheshlaghi Azar and Alessandro Lazaric and Emma Brunskill
2013 arXiv   pre-print
We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand.  ...  In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.  ...  Gap-Independent Bound We are now ready to derive the first regret bound for RLPA. with probability at least 1 − δ for any initial state s ∈ S. Proof.  ... 
arXiv:1305.1027v2 fatcat:ngfawqhphrg6pdmzidmbq5e3mq

Hierarchical model-based reinforcement learning

Nicholas K. Jong, Peter Stone
2008 Proceedings of the 25th international conference on Machine learning - ICML '08  
Model-based algorithms, which provided the first finite-time convergence guarantees for reinforcement learning, may also play an important role in coping with the relative scarcity of data in large environments  ...  In this paper, we introduce an algorithm that fully integrates modern hierarchical and model-learning methods in the standard reinforcement learning setting.  ...  Acknowledgments This material is based upon work supported by the National Science Foundation under Grant No. 0237699 and the DARPA Bootstrap Learning program.  ... 
doi:10.1145/1390156.1390211 fatcat:hv6i4sncsnfyfayvgqega7lh24
« Previous Showing results 1 — 15 out of 21,227 results