1 Hit in 5.6 sec

D3C: Reducing the Price of Anarchy in Multi-Agent Learning [article]

Ian Gemp, Kevin R. McKee, Richard Everett, Edgar A. Duéñez-Guzmán, Yoram Bachrach, David Balduzzi, Andrea Tacchetti
2022 arXiv   pre-print
We derive a differentiable, upper bound on a price of anarchy that agents can cheaply estimate during learning.  ...  Agents do so by learning to mix their reward (equiv. negative loss) with that of other agents by following the gradient of our derived upper bound. We refer to this approach as D3C.  ...  ACKNOWLEDGMENTS We are grateful to Jan Balaguer for fruitful discussions and advice on revising parts of the manuscript.  ... 
arXiv:2010.00575v5 fatcat:k5rjaw7xlzbf5oyzaap4sdfgzm