Filters








76 Hits in 6.7 sec

O(1/T) Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games [article]

James P. Bailey
2021 arXiv   pre-print
We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate O(1/T) for this set of games.  ...  Experimentally, we show with 97.5 time-averaged strategies that are 2.585 times closer to the set of Nash equilibria than optimistic gradient descent.  ...  used due to their O(1/T ) time-average convergence to the set of Nash equilibria in zero-sum games.  ... 
arXiv:2110.02482v1 fatcat:i7gupastqzfuld4edimtnjlbxa

Solving Large Extensive-Form Games with Strategy Constraints

Trevor Davis, Kevin Waugh, Michael Bowling
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Extensive-form games are a common model for multiagent interactions with imperfect information.  ...  In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints.  ...  CFR+ has been shown to converge with initial rate faster than O(1/T ) in a variety of games (Burch 2017, Sections 4.3-4.4 ). • Finally, CFR is not inherently limited to O(1/ √ T ) worstcase convergence  ... 
doi:10.1609/aaai.v33i01.33011861 fatcat:bskqq2sza5g4fnvqrqsarpbybi

Solving Large Extensive-Form Games with Strategy Constraints [article]

Trevor Davis and Kevin Waugh and Michael Bowling
2019 arXiv   pre-print
In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player.  ...  Extensive-form games are a common model for multiagent interactions with imperfect information.  ...  CFR+ has been shown to converge with initial rate faster than O(1/T ) in a variety of games (Burch 2017, Sections 4.3-4.4).  ... 
arXiv:1809.07893v2 fatcat:lkj5prgtazg2nnsdo6aaer5io4

Evolutionary Dynamics and Φ-Regret Minimization in Games [article]

Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls
2021 arXiv   pre-print
It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit  ...  Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games.  ...  Then the time-average opponent strategy observed by the row player up to time t is given by 1 v 1 x ν 1 + 1 v 2 x ν 2 1 v 1 x + 1 v 2 x + O 1 t . (8) Proof.  ... 
arXiv:2106.14668v1 fatcat:wrmagg2kyvegdjdxnsd7k5t42i

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory [article]

Yunlong Lu, Kai Yan
2020 arXiv   pre-print
Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications.  ...  Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning.  ...  A(I) is the action set in information set I. It is proved [87] that the regret bound is O(1/ √ T ), and is further improved to O(1/T 0.75 ) by Farina et al. [88].  ... 
arXiv:2001.06487v3 fatcat:o2iovnsbxfgp5omk67jwuoonma

Evolutionary Dynamics and Phi-Regret Minimization in Games

Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls
2022 The Journal of Artificial Intelligence Research  
It is well known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit  ...  Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games.  ...  This research-project is supported in part by the National  ... 
doi:10.1613/jair.1.13187 fatcat:hpeo5bx4ujeovn7jfwxrrkk3hi

Convergence of Strategies in Simple Co-Adapting Games

Richard Mealing, Jonathan L. Shapiro
2015 Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII - FOGA '15  
Also, unlike in fictitious play, our variants converge to solutions in the difficult Shapley's and Jordan's games.  ...  Fictitious play is an old but popular algorithm that can converge to solutions, albeit slowly, in selfplay in games like these.  ...  This is proven in Appendix C. If we set p * = q * , then the game is zero-sum, otherwise it is general-sum.  ... 
doi:10.1145/2725494.2725503 dblp:conf/foga/MealingS15 fatcat:twugekm45rafvjowkfxzgcybgq

On Sex, Evolution, and the Multiplicative Weights Update Algorithm [article]

Reshef Meir, David Parkes
2015 arXiv   pre-print
We further revise the implications for convergence and utility or fitness guarantees in coordination games. In contrast to the claim of Chastain et al.  ...  We consider a recent innovative theory by Chastain et al. on the role of sex in evolution [PNAS'14].  ...  We also acknowledge a useful correspondence with the authors of Chastain et al. [2014] , who clarified many points about their paper. Any mistakes and misunderstandings remain our own.  ... 
arXiv:1502.05056v1 fatcat:wzlve3kiinaala7trcqqlf4weu

Poincaré-Bendixson Limit Sets in Multi-Agent Learning [article]

Aleksander Czechowski, Georgios Piliouras
2022 arXiv   pre-print
Whereas convergence is often a property of learning algorithms in games satisfying a particular reward structure (e.g., zero-sum games), even basic learning models, such as the replicator dynamics, are  ...  Moreover, we provide simple conditions under which such behavior translates into efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a  ...  Frans A. Oliehoek for his support, and helpful advice.  ... 
arXiv:2102.00053v2 fatcat:noui63a6jrfz7pfpagmb5zoi4y

A Comparison of Self-Play Algorithms Under a Generalized Framework [article]

Daniel Hernandez, Kevin Denamganai, Sam Devlin, Spyridon Samothrakis, James Alfred Walker
2020 arXiv   pre-print
The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model.  ...  This framework is framed as an approximation to a theoretical solution concept for multiagent training.  ...  Gupta for his insightful conversations and work on Nash averaging.  ... 
arXiv:2006.04471v1 fatcat:6jpsg2hpknb7jbkrd7pmhrbn5u

Robust No-Regret Learning in Min-Max Stackelberg Games [article]

Denizalp Goktas, Jiayi Zhao, Amy Greenwald
2022 arXiv   pre-print
The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games.  ...  In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy  ...  ACKNOWLEDGMENTS We thank several anonymous reviewers for their feedback on an earlier draft of this paper. This work was partially supported by NSF Grant CMMI-1761546.  ... 
arXiv:2203.14126v2 fatcat:ileui4p3uzd6hmzg55avk37iwu

Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning [article]

Weiming Liu, Bin Li, Julian Togelius
2022 arXiv   pre-print
In ReCFR, Recursive Substitute Values (RSVs) are learned and used to replace cumulative regrets. It is proven that ReCFR can converge to a Nash equilibrium at a rate of O(1/√(T)).  ...  Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large-scale Imperfect Information Games (IIGs).  ...  We prove that Recursive CFR converges to a Nash equilibrium asymptotically, and converges in the rate of O( 1T ) in a certain case.  ... 
arXiv:2012.01870v3 fatcat:4mwqaw6ccvdk3a22yzhhky33xq

A Survey of Decision Making in Adversarial Games [article]

Xiuxian Li, Min Meng, Yiguang Hong, Jie Chen
2022 arXiv   pre-print
Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games  ...  , zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal  ...  O(1/T ) time-average convergence was established by using alternating gradient descent in [86] , where T is the time horizon.  ... 
arXiv:2207.07971v1 fatcat:egfm3ha6hrbijcu4rhpanspz2i

Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information [article]

Ceyhun Eksin, Alejandro Ribeiro
2016 arXiv   pre-print
A multi-agent system operates in an uncertain environment about which agents have different and time varying beliefs that, as time progresses, converge to a common belief.  ...  We exemplify the use of the algorithm in coordination and target covering games.  ...  1 t = O( 1 t ). (62) Now consider the row stochastic matrix W jl .  ... 
arXiv:1602.02066v1 fatcat:kv2mohq535f23bwsufn2i4st3i

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games [article]

Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong
2020 arXiv   pre-print
This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game.  ...  The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games  ...  It is proven that as T → ∞, CFR-Dconverges to a O( 1T )-Nash equilibrium [16] .)  ... 
arXiv:2007.13544v2 fatcat:v7ox7iqki5biloghgfkusezcja
« Previous Showing results 1 — 15 out of 76 results