Filters








28 Hits in 6.9 sec

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO [article]

Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson
2022 arXiv   pre-print
We show that, despite the non-stationarity that independent ratios cause, a monotonic improvement guarantee still arises as a result of enforcing the trust region constraint over all decentralized policies  ...  We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary  ...  We showed that, despite the non-stationarity in IPPO and MAPPO, a monotonic improvement guarantee still arises from enforcing the trust region constraint over all decentralized policies.  ... 
arXiv:2202.00082v1 fatcat:jvaj2l7mo5bj5i5hrc3owbt7qi

Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods [article]

Jian Hu, Siyue Hu, Shih-wei Liao
2021 arXiv   pre-print
Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value  ...  MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features, which may be not friendly to algorithmic utility.  ...  Indepent PPO (IPPO) & Non-stationarity IPPO train an independent PPO agent for each agent in the multi-agent system, and the literature [3] shows that he works effectively as well in some multi-agent  ... 
arXiv:2106.14334v13 fatcat:k2cf3m2zcvhk5mzqhi4jfwipfy

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? [article]

Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H.S. Torr, Mingfei Sun, Shimon Whiteson
2020 arXiv   pre-print
We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.  ...  Most recently developed approaches to cooperative multi-agent reinforcement learning in the centralized training with decentralized execution setting involve estimating a centralized, joint value function  ...  Trust region optimisation for reinforcement learning was popularised by TRPO [21] which implements iterative guaranteed monotonic improvements.  ... 
arXiv:2011.09533v1 fatcat:6fix2rj3jngbbgnplr6rhxrbfu

Distributed Reinforcement Learning for Robot Teams: A Review [article]

Yutong Wang and Mehul Damani and Pamela Wang and Yuhong Cao and Guillaume Sartoretti
2022 arXiv   pre-print
Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability.  ...  Summary: This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches.  ...  Springer Nature 2021 L A T E X template Distributed Reinforcement Learning for Robot Teams: A Review  ... 
arXiv:2204.03516v1 fatcat:iga6xlexmjbbflvuv5pjhifggy

Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning [article]

Jian Hu, Siyang Jiang, Seth Austin Harding, Haibin Wu, Shih-wei Liao
2022 arXiv   pre-print
Recent variants of QMIX target relaxing the monotonicity constraint of QMIX, allowing for performance improvement in SMAC.  ...  In this paper, we investigate the code-level optimizations of these variants and the monotonicity constraint. (1) We find that such improvements of the variants are significantly affected by various code-level  ...  However, IQL does not address the non-stationarity introduced due to the changing policies of the learning agents.  ... 
arXiv:2102.03479v18 fatcat:woylyndr5vb7dcy7shafwkqaf4

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition [article]

Wenhao Li, Xiangfeng Wang, Bo Jin, Junjie Sheng, Hongyuan Zha
2022 arXiv   pre-print
MAMT can approximately constrain the consecutive joint policies' divergence to satisfy δ-stationarity and alleviate the non-stationarity problem.  ...  Some existing works have discussed various consequences caused by non-stationarity with several kinds of measurement indicators.  ...  2020AAA0107400), NSFC (No. 12071145), STCSM (No. 19ZR141420, No. 20DZ1100304 and 20DZ1100300), Shanghai Trusted Industry Internet Software Collaborative Innovation Center, and the Fundamental Research Funds for  ... 
arXiv:2102.10616v2 fatcat:rzla4wpuk5bpdb47f3t5gagl2u

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize [article]

Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How
2021 arXiv   pre-print
To the best of our knowledge, this is the first DNN-based optimizer for constrained optimization with the forward invariance guarantee.  ...  We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically.  ...  We thank Amazon Web services for computational support.  ... 
arXiv:2006.11419v4 fatcat:x3qzrb2wtve6hoplr76c6uhzii

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey [article]

Tianxu Li, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Qihui Wu, Yang Zhang, Bing Chen
2022 arXiv   pre-print
Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments.  ...  However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity  ...  However, it may suffer from the non-stationarity issue, and the convergence of the learning policy cannot be guaranteed. D.  ... 
arXiv:2110.13484v2 fatcat:u2o5uxms65gmnp3q7xbh35l5oi

Coordinated Proximal Policy Optimization [article]

Zifan Wu, Chao Yu, Deheng Ye, Junge Zhang, Haiyin Piao, Hankz Hankui Zhuo
2021 arXiv   pre-print
We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations.  ...  MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.  ...  The authors would like to thank Wenxuan Zhu for pointing out some of the mistakes in the proof, Xingzhou Lou and Xianjie Zhang for running some of the experiments, as well as Siling Chen for proofreading  ... 
arXiv:2111.04051v1 fatcat:mhqi4rds7zbcrktolczas2sgnu

A Survey and Critique of Multiagent Deep Reinforcement Learning [article]

Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor
2019 arXiv   pre-print
for her visual designs for the figures in the article, to Frans Oliehoek, Sam Devlin, Marc Lanctot, Nolan Bard, Roberta Raileanu, Angeliki Lazaridou, and Yuhang Song for clarifications in their areas of  ...  expertise, to Baoxiang Wang for his suggestions on recent deep RL works, to Michael Kaisers, Daan Bloembergen, and Katja Hofmann for their comments about the practical challenges of MDRL, to the editor  ...  Self-play is a useful concept for learning algorithms (e.g., fictitious play [187] ) since under certain classes of games it can guarantee convergence 21 and it has been used as a standard technique  ... 
arXiv:1810.05587v2 fatcat:h4ei5zx2xfa7xocktlefjrvef4

Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments [article]

Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman
2021 arXiv   pre-print
Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests.  ...  The reason for that is the inherent non-stationarity of multi-agent environments (Laurent et al., 2011; Hernandez-Leal et al., 2017) .  ...  However, CRS implies decentralized training and does not address several crucial issues of MARL, such as credit assignment, partial observability, and inherent non-stationarity (Agogino & Tumer, 2004;  ... 
arXiv:2102.12307v1 fatcat:sin7m3schzayjfgerdbldiuj3q

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [article]

Kaiqing Zhang, Zhuoran Yang, Tamer Başar
2021 arXiv   pre-print
the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc.  ...  Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature.  ...  This partial information aggravates the issues caused by non-stationarity, as the samples can hardly recover the exact behavior of the opponents' underlying policies, which increases the non-stationarity  ... 
arXiv:1911.10635v2 fatcat:ihlhtjlhnrdizbkcfzsnz5urfq

Applications of Reinforcement Learning in Deregulated Power Market: A Comprehensive Review [article]

Ziqing Zhu, Ze Hu, Ka Wing Chan, Siqi Bu, Bin Zhou, Shiwei Xia
2022 arXiv   pre-print
The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics  ...  For each application, apart from a paradigmatic summary of generalized methodology, in-depth discussions of applicability and obstacles while deploying RL techniques are also provided.  ...  Hence, the non-negativity of this advantage function can ensure the policy function being continuously and monotonically updated towards the optimality, which is the key insight of TRPO algorithm.  ... 
arXiv:2205.08369v1 fatcat:yqdarokpnzf4zitcilkgigq4vm

Off-Policy Correction For Multi-Agent Reinforcement Learning [article]

Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś
2022 arXiv   pre-print
Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence.  ...  Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.  ...  ACKNOWLEDGMENTS We thank Konrad Staniszewski for discussions and experiments in the prequel of this project.  ... 
arXiv:2111.11229v2 fatcat:2wce75drbnguvmrxfgo7kqmwsu

Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning [article]

Matthieu Zimmer, Claire Glanois, Umer Siddique, Paul Weng
2021 arXiv   pre-print
Our solution method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized.  ...  As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness.  ...  Training Schedule Because the decentralized execution of independent policies causes non-stationarity in the gathered experience, like previous MADRL methods (Foerster et al., 2016; Jiang & Lu, 2019)  ... 
arXiv:2012.09421v4 fatcat:c6vdqrk7trdldonkkehyiubhce
« Previous Showing results 1 — 15 out of 28 results