A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an N-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for the expected regret that scales as Ω(N T). We then provide a nearly optimal algorithm and show that its expected regret scales as O(N^1+ϵ(T)) for an arbitrary small ϵ >0. The algorithm alternates between exploration and exploitation intervals sequentially wherearXiv:1509.07927v1 fatcat:ki5hzuki5rfgjer74tgkl7ovwe