Playing in stochastic environment: from multi-armed bandits to two-player games

Wieslaw Zielonka, Marc Herbstritt
2010 Foundations of Software Technology and Theoretical Computer Science  
Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for twoplayer games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits.
doi:10.4230/lipics.fsttcs.2010.65 dblp:conf/fsttcs/Zielonka10 fatcat:kwcsmza5grfafnd3eoqeuqvlhu