Batched Bandit Problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg
2015 Social Science Research Network  
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
doi:10.2139/ssrn.2683578 fatcat:mfc2dtzeunebdb4tijcqwho7ga