Filters








30 Hits in 4.2 sec

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG [article]

Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong
2018 arXiv   pre-print
This attention mechanism introduces a special structure to explicitly model the dynamic joint policy of teammates, making sure that the collected information can be processed efficiently.  ...  Second, to model the teammates' policies using the collected information in an effective way, ATT-MADDPG enhances the centralized critic with an attention mechanism.  ...  This work was supported by the National Natural Science Foundation of China under Grant No.61572044. The contact author is Zhen Xiao.  ... 
arXiv:1811.07029v1 fatcat:aecqvgpqqjf6nmao3lcuzdtjou

Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning [article]

Julien Roy, Paul Barde, Félix G. Harvey, Derek Nowrouzezahrai, Christopher Pal
2020 arXiv   pre-print
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents.  ...  Finally, we analyze the effects of our proposed methods on the policies that our agents learn and show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors  ...  We also acknowledge funding in support of this work from the Fonds de Recherche Nature et Technologies (FRQNT) and Mitacs, as well as Compute Canada for supplying computing resources.  ... 
arXiv:1908.02269v4 fatcat:rnk6zpb2sfcl7bzpct6zaupipq

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

Baolai Wang, Shengang Li, Xianzhong Gao, Tao Xie, Xingling Shao
2021 International Journal of Aerospace Engineering  
With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers' attention.  ...  In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov  ...  Acknowledgments This research was funded by the National Science Foundation of China (grant No. 61472476) and Postgraduate Scientific Research Innovation Project of Hunan Province (grant No.  ... 
doi:10.1155/2021/3360116 fatcat:ftfcbaa2wrh3nfjkn4hhiwhhvy

Efficient Cooperation Strategy Generation in Multi-Agent Video Games via Hypergraph Neural Network [article]

Bin Zhang, Yunpeng Bai, Zhiwei Xu, Dapeng Li, Guoliang Fan
2022 arXiv   pre-print
However, researchers have extra difficulties while working with video games in multi-agent environments.  ...  One of the most pressing issues presently being addressed is how to create sufficient collaboration between different agents in a scenario with numerous agents.  ...  The attention mechanism is also used by ATT-MADDPG [10] to complete the dynamic modeling of teammates.  ... 
arXiv:2203.03265v1 fatcat:6mzsax3nwbgzfhpvfikz6ea7nm

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey [article]

Tianxu Li, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Qihui Wu, Yang Zhang, Bing Chen
2022 arXiv   pre-print
Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies.  ...  decision-making policy adaptively through interacting with the unknown environments.  ...  MAAC is an extension of SAC [59] to multi-agent environments, where each agent has a critic that shares a central attention mechanism with the critics of other agents.  ... 
arXiv:2110.13484v2 fatcat:u2o5uxms65gmnp3q7xbh35l5oi

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [article]

Wenhao Li and Bo Jin and Xiangfeng Wang and Junchi Yan and Hongyuan Zha
2020 arXiv   pre-print
From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning.  ...  Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and  ...  Figure 3 : 3 Extensions of off-policy and on-policy actor-critic joint gradient.F2A2-DDPG Figure 4 : 4 Modeling other agents' policies.  ... 
arXiv:2004.11145v1 fatcat:tjkbnp4ndre33dxwy5ah2op6hy

A Survey of Deep Reinforcement Learning in Video Games [article]

Kun Shao, Zhentao Tang, Yuanheng Zhu, Nannan Li, Dongbin Zhao
2019 arXiv   pre-print
This learning mechanism updates the policy to maximize the return with an end-to-end method.  ...  In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties.  ...  ACKNOWLEDGMENT The authors would like to thank Qichao Zhang, Dong Li and Weifan Li for the helpful comments and discussions about this work.  ... 
arXiv:1912.10944v2 fatcat:fsuzp2sjrfcgfkyclrsyzflax4

An Introduction to Multi-Agent Reinforcement Learning and Review of its Application to Autonomous Mobility [article]

Lukas M. Schmidt, Johanna Brosig, Axel Plinge, Bjoern M. Eskofier, Christopher Mutschler
2022 arXiv   pre-print
Multi-Agent Reinforcement Learning (MARL) is a research field that aims to find optimal solutions for multiple agents that interact with each other.  ...  Many scenarios in mobility and traffic involve multiple different agents that need to cooperate to find a joint solution.  ...  B.M.E. gratefully acknowledges support of the German Research Foundation (DFG) within the framework of the Heisenberg professorship program (Grant ES 434/8-1).  ... 
arXiv:2203.07676v1 fatcat:vzmuhswdlbeu7hedsqmp474nzu

Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training [article]

Piyush K. Sharma, Rolando Fernandez, Erin Zaroukian, Michael Dorothy, Anjon Basak, Derrik E. Asher
2021 arXiv   pre-print
Much work has been dedicated to the exploration of Multi-Agent Reinforcement Learning (MARL) paradigms implementing a centralized learning with decentralized execution (CLDE) approach to achieve human-like  ...  The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory  ... 
arXiv:2107.14316v1 fatcat:n7qmmwwdenfbdngkmzflsqcx7y

With Whom to Communicate: Learning Efficient Communication for Multi-Robot Collision Avoidance [article]

Álvaro Serra-Gómez, Bruno Brito, Hai Zhu, Jen Jen Chung, Javier Alonso-Mora
2020 arXiv   pre-print
Decentralized multi-robot systems typically perform coordinated motion planning by constantly broadcasting their intentions as a means to cope with the lack of a central system coordinating the efforts  ...  This paper presents an efficient communication method that solves the problem of "when" and with "whom" to communicate in multi-robot collision avoidance scenarios.  ...  our approach is the natural extension of DDPG to multi-agent environments, that is, the Multi-Agent Deep Deterministic Policy Gradient algorithm (MADDPG) [16] .  ... 
arXiv:2009.12106v1 fatcat:7hyhsmse45aj7ibfabih64g6ai

Distributed Reinforcement Learning for Robot Teams: A Review [article]

Yutong Wang and Mehul Damani and Pamela Wang and Yuhong Cao and Guillaume Sartoretti
2022 arXiv   pre-print
The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS).  ...  Purpose of review: Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated  ...  ATT-MADDPG uses attention on the centralized critic to explicitly model the dynamic joint policy of teammates in order to improve cooperation [52] .  ... 
arXiv:2204.03516v1 fatcat:iga6xlexmjbbflvuv5pjhifggy

Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems [article]

Xiaotian Hao, Weixun Wang, Jianye Hao, Yaodong Yang
2019 arXiv   pre-print
However, learning is challenging in independent settings due to the local viewpoints of all agents, which perceive the world as a non-stationary environment due to the concurrently exploring teammates.  ...  Many tasks in practice require the collaboration of multiple agents through reinforcement learning.  ...  Experimental Results To make the different algorithms comparable, we pre-train both the rescue agents and wounded animal agents with DDPG and save the animal models during training.  ... 
arXiv:1909.11468v1 fatcat:glwziahczfcnbfkzgw4itviuli

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations [article]

Shiyu Huang, Wenze Chen, Longfei Zhang, Shizhen Xu, Ziyang Li, Fengming Zhu, Deheng Ye, Ting Chen, Jun Zhu
2021 arXiv   pre-print
Extensive experiments further show that our pre-trained model can accelerate the training process of the modern multi-agent algorithm and our method achieves state-of-the-art performances on various academic  ...  To the best of our knowledge, Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent  ...  MADDPG: A policy-based multi-agent algorithm, which adapts the single-agent DDPG algorithm to the multi-agent setting.  ... 
arXiv:2110.04507v5 fatcat:h2u4yhlpvjg6jfikjtefzz4v5y

Reinforcement Learning from Hierarchical Critics [article]

Zehong Cao, Chin-Teng Lin
2020 arXiv   pre-print
Then, we test the proposed RLHC algorithm against the benchmark algorithm, proximal policy optimisation (PPO), for two experimental scenarios performed in a Unity environment consisting of tennis and soccer  ...  In this study, we investigate the use of global information to speed up the learning process and increase the cumulative rewards of reinforcement learning (RL) in competition tasks.  ...  The toolkit supports dynamic multi-agent interaction, and agents can be trained using RL through a straightforward Python API.  ... 
arXiv:1902.03079v4 fatcat:jictmt7advcdzfe4j5scqftzhm

A Review of Cooperative Multi-Agent Deep Reinforcement Learning [article]

Afshin OroojlooyJadid, Davood Hajinezhad
2021 arXiv   pre-print
Also, a list of available environments for MARL research is provided in this survey. Finally, the paper is concluded with proposals on the possible research directions.  ...  In particular, we have focused on five common approaches on modeling and solving cooperative multi-agent reinforcement learning problems: (I) independent learners, (II) fully observable critic, (III) value  ...  Also, using an attention model, it obtains the weights of all K action-sets such that the hidden vector h t i of the attention model is generated via the actions of other agents (a t −i ).  ... 
arXiv:1908.03963v4 fatcat:s2umqzxmqrhntkev3f6k554cv4
« Previous Showing results 1 — 15 out of 30 results