Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder, Jean-Yves Audibert, Manfred Opper, John Shawe-Taylor
2010 Journal of machine learning research  
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions
more » ... about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.
dblp:journals/jmlr/GrunewalderAOS10 fatcat:m3zwfvtf4nh2zn2gvgbziu2n2y