Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme [article]

Kontantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet, Anand D. Sarwate
2022 arXiv   pre-print
We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is δ-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed
more » ... m is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.
arXiv:2006.06792v4 fatcat:yfrangvg5zfuhpgh45vn332juu