Learning to compete, compromise, and cooperate in repeated general-sum games

Jacob W. Crandall, Michael A. Goodrich
2005 Proceedings of the 22nd international conference on Machine learning - ICML '05  
Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent
more » ... e. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.
doi:10.1145/1102351.1102372 dblp:conf/icml/CrandallG05 fatcat:twtgdip535budmnwcj2kz2hsce