Penalty-Regulated Dynamics and Robust Learning Procedures in Games

Pierre Coucheney, Bruno Gaujal, Panayotis Mertikopoulos
2015 Mathematics of Operations Research  
Starting from a heuristic learning scheme for strategic n-person games, we derive a new class of continuous-time learning dynamics which consist of a replicator-like term adjusted by an entropic penalty that keeps players' strategies away from the boundary of the game's strategy space. These entropy-driven dynamics are equivalent to players taking an exponentially discounting aggregate of their on-going payoffs and then using a quantal response choice model to pick an action based on these
more » ... rmance scores. Owing to this inherent duality, these dynamics satisfy a variant of the folk theorem of evolutionary game theory and converge to (arbitrarily precise) quantal approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality in order to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs: in fact, the algorithm retains its robustness in the presence of stochastic perturbations and observation errors, and does not require any synchronization between players.
doi:10.1287/moor.2014.0687 fatcat:z5asnj7uzjcnreruv2vdjmwzha