A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO
[article]
2022
arXiv
pre-print
We show that, despite the non-stationarity that independent ratios cause, a monotonic improvement guarantee still arises as a result of enforcing the trust region constraint over all decentralized policies ...
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary ...
We showed that, despite the non-stationarity in IPPO and MAPPO, a monotonic improvement guarantee still arises from enforcing the trust region constraint over all decentralized policies. ...
arXiv:2202.00082v1
fatcat:jvaj2l7mo5bj5i5hrc3owbt7qi
Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
[article]
2021
arXiv
pre-print
Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value ...
MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features, which may be not friendly to algorithmic utility. ...
Indepent PPO (IPPO) & Non-stationarity IPPO train an independent PPO agent for each agent in the multi-agent system, and the literature [3] shows that he works effectively as well in some multi-agent ...
arXiv:2106.14334v13
fatcat:k2cf3m2zcvhk5mzqhi4jfwipfy
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
[article]
2020
arXiv
pre-print
We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity. ...
Most recently developed approaches to cooperative multi-agent reinforcement learning in the centralized training with decentralized execution setting involve estimating a centralized, joint value function ...
Trust region optimisation for reinforcement learning was popularised by TRPO [21] which implements iterative guaranteed monotonic improvements. ...
arXiv:2011.09533v1
fatcat:6fix2rj3jngbbgnplr6rhxrbfu
Distributed Reinforcement Learning for Robot Teams: A Review
[article]
2022
arXiv
pre-print
Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. ...
Summary: This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. ...
Springer Nature 2021 L A T E X template Distributed Reinforcement Learning for Robot Teams: A Review ...
arXiv:2204.03516v1
fatcat:iga6xlexmjbbflvuv5pjhifggy
Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning
[article]
2022
arXiv
pre-print
Recent variants of QMIX target relaxing the monotonicity constraint of QMIX, allowing for performance improvement in SMAC. ...
In this paper, we investigate the code-level optimizations of these variants and the monotonicity constraint. (1) We find that such improvements of the variants are significantly affected by various code-level ...
However, IQL does not address the non-stationarity introduced due to the changing policies of the learning agents. ...
arXiv:2102.03479v18
fatcat:woylyndr5vb7dcy7shafwkqaf4
Dealing with Non-Stationarity in MARL via Trust-Region Decomposition
[article]
2022
arXiv
pre-print
MAMT can approximately constrain the consecutive joint policies' divergence to satisfy δ-stationarity and alleviate the non-stationarity problem. ...
Some existing works have discussed various consequences caused by non-stationarity with several kinds of measurement indicators. ...
2020AAA0107400), NSFC (No. 12071145), STCSM (No. 19ZR141420, No. 20DZ1100304 and 20DZ1100300), Shanghai Trusted Industry Internet Software Collaborative Innovation Center, and the Fundamental Research Funds for ...
arXiv:2102.10616v2
fatcat:rzla4wpuk5bpdb47f3t5gagl2u
FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize
[article]
2021
arXiv
pre-print
To the best of our knowledge, this is the first DNN-based optimizer for constrained optimization with the forward invariance guarantee. ...
We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. ...
We thank Amazon Web services for computational support. ...
arXiv:2006.11419v4
fatcat:x3qzrb2wtve6hoplr76c6uhzii
Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey
[article]
2022
arXiv
pre-print
Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. ...
However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity ...
However, it may suffer from the non-stationarity issue, and the convergence of the learning policy cannot be guaranteed.
D. ...
arXiv:2110.13484v2
fatcat:u2o5uxms65gmnp3q7xbh35l5oi
Coordinated Proximal Policy Optimization
[article]
2021
arXiv
pre-print
We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. ...
MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks. ...
The authors would like to thank Wenxuan Zhu for pointing out some of the mistakes in the proof, Xingzhou Lou and Xianjie Zhang for running some of the experiments, as well as Siling Chen for proofreading ...
arXiv:2111.04051v1
fatcat:mhqi4rds7zbcrktolczas2sgnu
A Survey and Critique of Multiagent Deep Reinforcement Learning
[article]
2019
arXiv
pre-print
for her visual designs for the figures in the article, to Frans Oliehoek, Sam Devlin, Marc Lanctot, Nolan Bard, Roberta Raileanu, Angeliki Lazaridou, and Yuhang Song for clarifications in their areas of ...
expertise, to Baoxiang Wang for his suggestions on recent deep RL works, to Michael Kaisers, Daan Bloembergen, and Katja Hofmann for their comments about the practical challenges of MDRL, to the editor ...
Self-play is a useful concept for learning algorithms (e.g., fictitious play [187] ) since under certain classes of games it can guarantee convergence 21 and it has been used as a standard technique ...
arXiv:1810.05587v2
fatcat:h4ei5zx2xfa7xocktlefjrvef4
Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments
[article]
2021
arXiv
pre-print
Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. ...
The reason for that is the inherent non-stationarity of multi-agent environments (Laurent et al., 2011; Hernandez-Leal et al., 2017) . ...
However, CRS implies decentralized training and does not address several crucial issues of MARL, such as credit assignment, partial observability, and inherent non-stationarity (Agogino & Tumer, 2004; ...
arXiv:2102.12307v1
fatcat:sin7m3schzayjfgerdbldiuj3q
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
[article]
2021
arXiv
pre-print
the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. ...
Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. ...
This partial information aggravates the issues caused by non-stationarity, as the samples can hardly recover the exact behavior of the opponents' underlying policies, which increases the non-stationarity ...
arXiv:1911.10635v2
fatcat:ihlhtjlhnrdizbkcfzsnz5urfq
Applications of Reinforcement Learning in Deregulated Power Market: A Comprehensive Review
[article]
2022
arXiv
pre-print
The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics ...
For each application, apart from a paradigmatic summary of generalized methodology, in-depth discussions of applicability and obstacles while deploying RL techniques are also provided. ...
Hence, the non-negativity of this advantage function can ensure the policy function being continuously and monotonically updated towards the optimality, which is the key insight of TRPO algorithm. ...
arXiv:2205.08369v1
fatcat:yqdarokpnzf4zitcilkgigq4vm
Off-Policy Correction For Multi-Agent Reinforcement Learning
[article]
2022
arXiv
pre-print
Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. ...
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. ...
ACKNOWLEDGMENTS We thank Konrad Staniszewski for discussions and experiments in the prequel of this project. ...
arXiv:2111.11229v2
fatcat:2wce75drbnguvmrxfgo7kqmwsu
Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning
[article]
2021
arXiv
pre-print
Our solution method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. ...
As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness. ...
Training Schedule Because the decentralized execution of independent policies causes non-stationarity in the gathered experience, like previous MADRL methods (Foerster et al., 2016; Jiang & Lu, 2019) ...
arXiv:2012.09421v4
fatcat:c6vdqrk7trdldonkkehyiubhce
« Previous
Showing results 1 — 15 out of 28 results