A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Online Learning in Unknown Markov Games
[article]
2021
arXiv
pre-print
We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret by competing with the minimax value of the game, and present an algorithm that achieves a sublinear 𝒪̃(K^2/3) regret after K episodes. This is the first
arXiv:2010.15020v2
fatcat:w6u272f33jg7fpq4gi4pnolmuu