Mirror Descent and the Information Ratio [article]

Tor Lattimore, András György
2020 arXiv   pre-print
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper
more » ... nd matches exactly the best known information-theoretic upper bound.
arXiv:2009.12228v1 fatcat:ai56kz53pvasfe3desuu655nja