A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of thedoi:10.1137/1.9781611973068.5 fatcat:kij2svx5gfhgrgwth2dxd4ki4m