A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments
2008
Theoretical Computer Science
The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions ("experts"), under partial observation: In each round t, only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decision. It is known that an adversary controlling the costs of the decisions can force the player a regret
doi:10.1016/j.tcs.2008.02.024
fatcat:6bzklixrnbdzjhlpc26sismy74