A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
We consider an adversarial variant of the classic K-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the d-dimensional contexts are generated i.i.d. at random from a known distributions, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of O(√(KdT)) over TarXiv:2002.00287v3 fatcat:fx7iy4tierfrvd65kv7s4j5yma