A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Bandits with switching costs
2014
Proceedings of the 46th Annual ACM Symposium on Theory of Computing - STOC '14
We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T -round minimax regret in this setting is Θ(T 2/3 ), thereby closing a fundamental gap in our understanding of learning with bandit feedback. In the corresponding full-information version of the problem, the minimax regret is known to grow at a much slower rate of Θ( √ T ). The difference between these two rates provides the first
doi:10.1145/2591796.2591868
dblp:conf/stoc/DekelDKP14
fatcat:xkj74tb3qbfqtjtusftuuxccc4