Filters








5,171 Hits in 8.1 sec

Sparse Attention Guided Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning [article]

Jaskirat Singh, Liang Zheng
2021 arXiv   pre-print
While such a strategy is helpful with generalization, the use of multiple scenes significantly increases the variance of samples collected for policy gradient computations.  ...  Training deep reinforcement learning agents on environments with multiple levels / scenes from the same task, has become essential for many applications aiming to achieve generalization and domain transfer  ...  Policy Gradient Formulation for Multi-Scene Environments In this section, we provide a mathematical derivation for extending the variance reduction formulation for policy gradient algorithms to multi-scene  ... 
arXiv:2102.07266v1 fatcat:kl2cqnh3urc55cctwn4qiurwnm

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs

Jingkai Mao, Jakob N. Foerster, Tim Rocktäschel, Maruan Al-Shedivat, Gregory Farquhar, Shimon Whiteson
2019 International Conference on Machine Learning  
arise in, e.g., multi-agent reinforcement learning and meta-learning.  ...  applications of higher order gradients in reinforcement learning and meta-learning.  ...  Conclusion Recent progress in multi-agent reinforcement learning and meta-learning has lead to a variety of approaches that employ second order gradient estimators.  ... 
dblp:conf/icml/MaoFRAFW19 fatcat:s6nktfh2bfeoda2tehxu3vk2ua

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning [article]

Jaskirat Singh, Liang Zheng
2020 arXiv   pre-print
While such a strategy is helpful with generalization, the use of multiple scenes significantly increases the variance of samples collected for policy gradient computations.  ...  Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications aiming to achieve generalization and  ...  extensive analysis for enhanced variance reduction in multi-scene reinforcement learning.  ... 
arXiv:2005.12254v1 fatcat:jm6v6taph5bw3iqtjsxsox7gci

Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning [article]

Yuchen Xiao, Xueguang Lyu, Christopher Amato
2021 arXiv   pre-print
Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity  ...  By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly  ...  history-value function Vwπθ (τ ) in POMDPs as the critic and ing CTDE-based policy gradient approaches. d) facilitating incorporate it into the policy gradient in a variance reduction agents to  ... 
arXiv:2110.08642v3 fatcat:cgkvfeqzdjd6znwobewgarqct4

Reinforcement learning with a network of spiking agents [article]

Sneha Aenugu, Abhishek Sharma, Sasikiran Yelamarthi, Hananel Hazan, Philip S. Thomas, Robert Kozma
2019 arXiv   pre-print
We build on this theory to propose a multi-agent learning framework with spiking neurons in the generalized linear model (GLM) formulation as agents, to solve reinforcement learning (RL) tasks.  ...  We show that a network of GLM spiking agents connected in a hierarchical fashion, where each spiking agent modulates its firing policy based on local information and a global prediction error, can learn  ...  In this study, we demonstrate that a multi-agent RL framework with each agent modeled after the GLM model of a spiking neuron Pillow et al. (2008) , can learn complex stimulus-action mappings with local  ... 
arXiv:1910.06489v3 fatcat:p67vvwl7evfore7qyvnk72vdsu

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Zhanhong Jiang, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya Balu, Young M Lee, Chinmay Hegde, Soumik Sarkar
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations.  ...  Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.  ...  Introduction Multi-agent reinforcement learning (MARL) is an emerging topic which has been explored both in theoretical (Nguyen et al. 2014; Qu et al. 2019; Zhang et al. 2021b ) and empirical settings  ... 
doi:10.1609/aaai.v36i9.21169 fatcat:tz3ywlxaqnbmhiqmetahmm5sqa

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines [article]

Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel
2018 arXiv   pre-print
Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates.  ...  Finally, we show that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.  ...  We believe that our method will facilitate further applications of reinforcement learning methods in domains with extremely highdimensional actions, including multi-agent systems.  ... 
arXiv:1803.07246v1 fatcat:qpi4vecz3jbwxdmaxpjfesixr4

Empirical Analysis of Policy Gradient Algorithms where Starting States are Sampled accordingly to Most Frequently Visited States

Samy Aittahar, Raphaël Fonteneau, Damien Ernst
2020 IFAC-PapersOnLine  
the reinforcement learning task.  ...  the reinforcement learning task.  ...  by the agent with its current policy in previous episodes.  ... 
doi:10.1016/j.ifacol.2020.12.2279 fatcat:qrgjk2acrngatnvpse74ebamga

Reducing Variance in Gradient Bandit Algorithm using Antithetic Variates Method

Sihao Yu, Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
Policy gradient, which makes use of Monte Carlo method to get an unbiased estimation of the parameter gradients, has been widely used in reinforcement learning.  ...  From the viewpoint of statistics, policy gradient with baseline, a successful variance reduction method for policy gradient, directly applies the control variates method, a traditional variance reduction  ...  BACKGROUND: VARIANCE REDUCTION IN POLICY GRADIENT This section introduces the formulation of variance reduction methods in the policy gradient for the multi-armed bandit problem.  ... 
doi:10.1145/3209978.3210068 dblp:conf/sigir/YuXLGC18 fatcat:ocaa4bc6dzaidkqnzsltlipdiy

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning [article]

Ross E. Allen, Jayesh K. Gupta, Jaime Pena, Yutai Zhou, Javona White Bear, Mykel J. Kochenderfer
2021 arXiv   pre-print
We show significant improvement in learning performance compared to policy gradient methods that do not perform multi-agent credit assignment.  ...  We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward.  ...  multi-agent (COMA) policy gradients.  ... 
arXiv:1908.01022v4 fatcat:lu6yn2rvefcddjnqrvgibla5gu

Stochastic Variance Reduction for Policy Gradient Estimation [article]

Tianbing Xu, Qiang Liu, Jian Peng
2018 arXiv   pre-print
Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems.  ...  In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency.  ...  Reinforcement Learning Consider an agent operating in an uncertain environment.  ... 
arXiv:1710.06034v4 fatcat:x3jwl3z35jbthakcn4u7mr7yiu

Multi-View Reinforcement Learning [article]

Minne Li, Lisheng Wu, Haitham Bou Ammar, Jun Wang
2019 arXiv   pre-print
This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models.  ...  Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.  ...  Model-Free Multi-View Reinforcement Learning through Observation Augmentation The type of algorithm we employ for model free multi-view reinforcement learning falls in the class of policy gradient algorithms  ... 
arXiv:1910.08285v1 fatcat:xng5iej2wza2jc7yzqh53opt3q

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Centralized Training for Decentralized Execution, where training is done in a centralized offline fashion, has become a popular solution paradigm in Multi-Agent Reinforcement Learning.  ...  In this paper, we show that state-based critics can introduce bias in the policy gradient estimates, potentially undermining the asymptotic guarantees of the algorithm.  ...  This happen to be not unusual in current multi-agent reinforcement learning benchmarks (see Sections 5.3 and 5.4).  ... 
doi:10.1609/aaai.v36i9.21171 fatcat:ml4zb26qhrdlrmpy62tbi7hiyi

Training spiking neural networks using reinforcement learning [article]

Sneha Aenugu
2020 arXiv   pre-print
In one approach, we consider each neuron in a multi-layer neural network as an independent RL agent forming a different representation of the feature space while the network as a whole forms the representation  ...  We primarily focus on investigating the candidacy of reinforcement learning (RL) rules in solving the spatial and temporal credit assignment problems to enable decision-making in complex tasks.  ...  Learning by reinforcement in spiking neural networks Multi-agent learning Background, Preliminaries and Notation A reinforcement learning (RL) domain expressed as a Markov Decision Process (MDP) is  ... 
arXiv:2005.05941v1 fatcat:f4d6y642x5fvnjb7iftfpwwyau

Robust Reinforcement Learning for Autonomous Driving

Yesmina Jaâfra, Jean Luc Laurent, Aline Deruyver, Mohamed Saber Naceur
2019 International Conference on Learning Representations  
In this work, we propose a deep reinforcement learning (RL) algorithm embedding an actor critic architecture with multi-step returns to achieve a better robustness of the agent learning strategies when  ...  The developed deep actor RL guided by a policy-evaluator critic distinctly surpasses the performance of a standard deep RL agent.  ...  In order to reduce the variance of the policy gradient and stabilize learning, we can subtract a baseline function, e.g. the state value function, from the policy gradient.  ... 
dblp:conf/iclr/JaafraLDN19 fatcat:nazjhl4fbra3bok3us6jwezdsy
« Previous Showing results 1 — 15 out of 5,171 results