A Novel Confidence-Based Algorithm for Structured Bandits [article]

Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli
2020 arXiv   pre-print
We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by
more » ... g other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods.
arXiv:2005.11593v1 fatcat:3dqqmxsg2bfy5isgx3ifikkhxe