A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Thompson Sampling for Bandits with Clustered Arms
2021
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
unpublished
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative
doi:10.24963/ijcai.2021/305
fatcat:xlki44klcrgwzk2fudqk5ymuyu