A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
MOTS: Minimax Optimal Thompson Sampling
[article]
2020
arXiv
pre-print
Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound Ω(√(KT)) for K-armed bandit problems, where T is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson
arXiv:2003.01803v3
fatcat:wjtbnm2yincg5dxusu4fziphjm