A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
O(1/T) Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games
[article]
2021
arXiv
pre-print
We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate O(1/T) for this set of games. ...
Experimentally, we show with 97.5 time-averaged strategies that are 2.585 times closer to the set of Nash equilibria than optimistic gradient descent. ...
used due to their O(1/T ) time-average convergence to the set of Nash equilibria in zero-sum games. ...
arXiv:2110.02482v1
fatcat:i7gupastqzfuld4edimtnjlbxa
Solving Large Extensive-Form Games with Strategy Constraints
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Extensive-form games are a common model for multiagent interactions with imperfect information. ...
In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. ...
CFR+ has been shown to converge with initial rate faster than O(1/T ) in a variety of games (Burch 2017, Sections 4.3-4.4 ). • Finally, CFR is not inherently limited to O(1/ √ T ) worstcase convergence ...
doi:10.1609/aaai.v33i01.33011861
fatcat:bskqq2sza5g4fnvqrqsarpbybi
Solving Large Extensive-Form Games with Strategy Constraints
[article]
2019
arXiv
pre-print
In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. ...
Extensive-form games are a common model for multiagent interactions with imperfect information. ...
CFR+ has been shown to converge with initial rate faster than O(1/T ) in a variety of games (Burch 2017, Sections 4.3-4.4). ...
arXiv:1809.07893v2
fatcat:lkj5prgtazg2nnsdo6aaer5io4
Evolutionary Dynamics and Φ-Regret Minimization in Games
[article]
2021
arXiv
pre-print
It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit ...
Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. ...
Then the time-average opponent strategy observed by the row player up to time t is given by 1 v 1 x ν 1 + 1 v 2 x ν 2 1 v 1 x + 1 v 2 x + O 1 t . (8) Proof. ...
arXiv:2106.14668v1
fatcat:wrmagg2kyvegdjdxnsd7k5t42i
Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
[article]
2020
arXiv
pre-print
Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications. ...
Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning. ...
A(I) is the action set in information set I. It is proved [87] that the regret bound is O(1/ √ T ), and is further improved to O(1/T 0.75 ) by Farina et al. [88]. ...
arXiv:2001.06487v3
fatcat:o2iovnsbxfgp5omk67jwuoonma
Evolutionary Dynamics and Phi-Regret Minimization in Games
2022
The Journal of Artificial Intelligence Research
It is well known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit ...
Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. ...
This research-project is supported in part by the National ...
doi:10.1613/jair.1.13187
fatcat:hpeo5bx4ujeovn7jfwxrrkk3hi
Convergence of Strategies in Simple Co-Adapting Games
2015
Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII - FOGA '15
Also, unlike in fictitious play, our variants converge to solutions in the difficult Shapley's and Jordan's games. ...
Fictitious play is an old but popular algorithm that can converge to solutions, albeit slowly, in selfplay in games like these. ...
This is proven in Appendix C. If we set p * = q * , then the game is zero-sum, otherwise it is general-sum. ...
doi:10.1145/2725494.2725503
dblp:conf/foga/MealingS15
fatcat:twugekm45rafvjowkfxzgcybgq
On Sex, Evolution, and the Multiplicative Weights Update Algorithm
[article]
2015
arXiv
pre-print
We further revise the implications for convergence and utility or fitness guarantees in coordination games. In contrast to the claim of Chastain et al. ...
We consider a recent innovative theory by Chastain et al. on the role of sex in evolution [PNAS'14]. ...
We also acknowledge a useful correspondence with the authors of Chastain et al. [2014] , who clarified many points about their paper. Any mistakes and misunderstandings remain our own. ...
arXiv:1502.05056v1
fatcat:wzlve3kiinaala7trcqqlf4weu
Poincaré-Bendixson Limit Sets in Multi-Agent Learning
[article]
2022
arXiv
pre-print
Whereas convergence is often a property of learning algorithms in games satisfying a particular reward structure (e.g., zero-sum games), even basic learning models, such as the replicator dynamics, are ...
Moreover, we provide simple conditions under which such behavior translates into efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a ...
Frans A. Oliehoek for his support, and helpful advice. ...
arXiv:2102.00053v2
fatcat:noui63a6jrfz7pfpagmb5zoi4y
A Comparison of Self-Play Algorithms Under a Generalized Framework
[article]
2020
arXiv
pre-print
The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model. ...
This framework is framed as an approximation to a theoretical solution concept for multiagent training. ...
Gupta for his insightful conversations and work on Nash averaging. ...
arXiv:2006.04471v1
fatcat:6jpsg2hpknb7jbkrd7pmhrbn5u
Robust No-Regret Learning in Min-Max Stackelberg Games
[article]
2022
arXiv
pre-print
The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. ...
In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy ...
ACKNOWLEDGMENTS We thank several anonymous reviewers for their feedback on an earlier draft of this paper. This work was partially supported by NSF Grant CMMI-1761546. ...
arXiv:2203.14126v2
fatcat:ileui4p3uzd6hmzg55avk37iwu
Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning
[article]
2022
arXiv
pre-print
In ReCFR, Recursive Substitute Values (RSVs) are learned and used to replace cumulative regrets. It is proven that ReCFR can converge to a Nash equilibrium at a rate of O(1/√(T)). ...
Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large-scale Imperfect Information Games (IIGs). ...
We prove that Recursive CFR converges to a Nash equilibrium asymptotically, and converges in the rate of O( 1 √ T ) in a certain case. ...
arXiv:2012.01870v3
fatcat:4mwqaw6ccvdk3a22yzhhky33xq
A Survey of Decision Making in Adversarial Games
[article]
2022
arXiv
pre-print
Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games ...
, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal ...
O(1/T ) time-average convergence was established by using alternating gradient descent in [86] , where T is the time horizon. ...
arXiv:2207.07971v1
fatcat:egfm3ha6hrbijcu4rhpanspz2i
Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information
[article]
2016
arXiv
pre-print
A multi-agent system operates in an uncertain environment about which agents have different and time varying beliefs that, as time progresses, converge to a common belief. ...
We exemplify the use of the algorithm in coordination and target covering games. ...
1 t = O( 1 t ). (62) Now consider the row stochastic matrix W jl . ...
arXiv:1602.02066v1
fatcat:kv2mohq535f23bwsufn2i4st3i
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
[article]
2020
arXiv
pre-print
This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. ...
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games ...
It is proven that as T → ∞, CFR-Dconverges to a O( 1 √ T )-Nash equilibrium [16]
.) ...
arXiv:2007.13544v2
fatcat:v7ox7iqki5biloghgfkusezcja
« Previous
Showing results 1 — 15 out of 76 results