Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization [chapter]

Philip Bachman, Doina Precup
2013 Lecture Notes in Computer Science  
We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top-m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit,
more » ... e develop a method based on posterior sampling. We show empirically that our method outperforms existing methods for top-m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously.
doi:10.1007/978-3-642-40988-2_16 fatcat:3gp4ikabnvgvrjdaetq4x3th64