A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ * ∈ Θ. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent isdblp:journals/jmlr/MaillardM11 fatcat:qpote7g7bbd2xpodno7yegmg7i