245 Hits in 6.5 sec

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication [article]

Hardik Meisheri, Harshad Khadilkar
2020 arXiv   pre-print
The defining feature of our algorithm is achieving sample efficiency within a restrictive computational budget while beating the previous years learning agents.  ...  We show that the proposed approach is able to achieve competitive performance within half a million games of training, significantly faster than other studies in the literature.  ...  Communication Protocol In TeamRadio variant of Pommerman, limited communication between the agents is allowed.  ... 
arXiv:2011.00424v1 fatcat:yeqneqmxcrcvnig5ch3snvwa4a

Multi-agent reinforcement learning with approximate model learning for competitive games

Young Joon Park, Yoon Sang Cho, Seoung Bum Kim, Drew Fudenberg
2019 PLoS ONE  
In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements.  ...  The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to  ...  With deterministic policy gradients, agents learn differentiable communication protocols for coordinating with others.  ... 
doi:10.1371/journal.pone.0222215 pmid:31509568 pmcid:PMC6739057 fatcat:4lp2babw2bcjxnytrqp6cqrnqq

Team-partitioned, opaque-transition reinforcement learning

Peter Stone, Manuela Veloso
1999 Proceedings of the third annual conference on Autonomous Agents - AGENTS '99  
Multi-agent scenarios are opaque-transition, as team members are not always in full communication with one another and adversaries may affect the environment.  ...  It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities.  ...  Conclusion TPOT-RL is an adaptation of RL to non-Markovian multi-agent domains with opaque transitions, large state spaces, hidden state and limited training opportunities.  ... 
doi:10.1145/301136.301195 dblp:conf/agents/StoneV99 fatcat:lvmn5vaw2rdepbcrctpw4uxzp4

Team-Partitioned, Opaque-Transition Reinforcement Learning [chapter]

Peter Stone, Manuela Veloso
1999 Lecture Notes in Computer Science  
Multi-agent scenarios are opaque-transition, as team members are not always in full communication with one another and adversaries may affect the environment.  ...  It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities.  ...  Conclusion TPOT-RL is an adaptation of RL to non-Markovian multi-agent domains with opaque transitions, large state spaces, hidden state and limited training opportunities.  ... 
doi:10.1007/3-540-48422-1_21 fatcat:lclqx5qoenhtlm6minc47o6chu

The Hanabi Challenge: A New Frontier for AI Research [article]

Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad (+3 others)
2019 arXiv   pre-print
In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker.  ...  We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and  ...  Acknowledgements We would like to thank many people: Matthieu d'Epenoux of Cocktail Games and Antoine Bauza, who designed Hanabi, for their support on this project; Alden Christianson for help with coordinating  ... 
arXiv:1902.00506v1 fatcat:ri6mermrefdffg4bvnixt5hdu4

Adversarially Guided Self-Play for Adopting Social Conventions [article]

Mycal Tucker, Yilun Zhou, Julie Shah
2020 arXiv   pre-print
Robotic agents must adopt existing social conventions in order to be effective teammates.  ...  Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them.  ...  Finally, our third domain tests three-agent teams in a multi-step coordination game involving multiple communication conventions.  ... 
arXiv:2001.05994v2 fatcat:xxrj6uguvvc3zpp6drt4ysyrbe

A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems [article]

Reuth Mirsky and Ignacio Carlucho and Arrasy Rahman and Elliot Fosong and William Macke and Mohan Sridharan and Peter Stone and Stefano V. Albrecht
2022 arXiv   pre-print
Ad hoc teamwork is the well-established research problem of designing agents that can collaborate with new teammates without prior coordination. This survey makes a two-fold contribution.  ...  Second, it discusses the progress that has been made in the field so far, and identifies the immediate and long-term open problems that need to be addressed in the field of ad hoc teamwork.  ...  Unlike multi-agent reinforcement learning (see Section 3.2), which supports joint training for all agents in the team, AHT does not assume that the deployed teammates are the same as those the learner  ... 
arXiv:2202.10450v1 fatcat:twqmsvosbjbepnyxf3f3v6hlba

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget [article]

Omkar Shelke, Hardik Meisheri, Harshad Khadilkar
2021 arXiv   pre-print
Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive  ...  In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100,000 games, starting from a fixed base policy (which is itself  ...  Conclusion In a hybrid cooperative/adversarial multi-agent game such as Pommerman, curriculum learning is a popular way of accelerating training.  ... 
arXiv:2102.11762v2 fatcat:il54s3qbmjdzlk6yos3kva22bq

From Motor Control to Team Play in Simulated Humanoid Football [article]

Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever (+10 others)
2021 arXiv   pre-print
that extend far beyond the body itself, ultimately involving coordination with other agents.  ...  However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment.  ...  Tobias Springenberg and Peter Stone for their insights and helpful comments; Tyler Liechty and Amy Merrick at DeepMind for assistance obtaining motion capture data; and Darren Carikas for assistance with  ... 
arXiv:2105.12196v1 fatcat:owunoflhgbfofo63hhqbkyg4hy

Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning [article]

Julien Roy, Paul Barde, Félix G. Harvey, Derek Nowrouzezahrai, Christopher Pal
2020 arXiv   pre-print
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents.  ...  Our experiments show improved performance across many cooperative multi-agent problems.  ...  We also acknowledge funding in support of this work from the Fonds de Recherche Nature et Technologies (FRQNT) and Mitacs, as well as Compute Canada for supplying computing resources.  ... 
arXiv:1908.02269v4 fatcat:rnk6zpb2sfcl7bzpct6zaupipq

Winning Isn't Everything: Enhancing Game Development with Intelligent Agents [article]

Yunqi Zhao, Igor Borovikov, Fernando de Mesentier Silva, Ahmad Beirami, Jason Rupert, Caedmon Somers, Jesse Harder, John Kolen, Jervis Pinto, Reza Pourabolghasem, James Pestrak, Harold Chaput (+5 others)
2020 arXiv   pre-print
In this paper, we study the problem of training intelligent agents in service of game development.  ...  Unlike the agents built to "beat the game", our agents aim to produce human-like behavior to help with game evaluation and balancing.  ...  ACKNOWLEDGEMENT The authors are thankful to EA Sports and other game team partners for their support and collaboration.  ... 
arXiv:1903.10545v5 fatcat:v2vc7pxzvrfbbiezdttz7mfyze

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning [article]

Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, Xiaolong Wang
2020 arXiv   pre-print
of training agents in a stage-wise manner.  ...  In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large  ...  In the adversarial battle game, Similar to grassland game, the entity types are agent itself, other teammates, enemies and food.  ... 
arXiv:2003.10423v1 fatcat:e7csz6bdprdbnc6nqxfb4l76ee

Continual Match Based Training in Pommerman: Technical Report [article]

Peng Peng, Liang Pang, Yufeng Yuan, Chao Gao
2018 arXiv   pre-print
with no communication.  ...  Specifically, we propose a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic (A2C) agents in Pommerman, a partially observable multi-agent environment  ...  INTRODUCTION Pommerman [17] is a multi-agent environment based on the classic console game Bomberman.  ... 
arXiv:1812.07297v1 fatcat:qm6vwut5ovhxhluak5b7yet6oa

Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts [article]

Sage Bergerson
2021 arXiv   pre-print
Methods which use recursive reasoning or updating also perform well, including the feedback NE and archive multi-agent adversarial IRL.  ...  Multi-agent inverse reinforcement learning (MIRL) can be used to learn reward functions from agents in social environments.  ...  In the initial policy step, an adversarial training algorithm solves for a NE strategy in a zero-sum stochastic game parameterized by the current reward function.  ... 
arXiv:2109.01178v1 fatcat:v635kuj4wfg4nndxvclbaubv2q

Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph [article]

Chuangchuang Sun, Macheng Shen, Jonathan P. How
2020 arXiv   pre-print
Through this sparsity structure, the agents can communicate in an effective as well as efficient way via only selectively attending to agents that matter the most and thus the scale of the MARL problem  ...  The complexity of multiagent reinforcement learning (MARL) in multiagent systems increases exponentially with respect to the agent number.  ...  As a stargraph is a connected graph with possibly minimum edges, this communication protocol is both effective and efficient.  ... 
arXiv:2003.01040v2 fatcat:ch3zgjpd2vcmbjpfis5nm3gglm
« Previous Showing results 1 — 15 out of 245 results