A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
What You See May Not Be What You Get: UCB Bandit Algorithms Robust to ϵ-Contamination
[article]
2020
arXiv
pre-print
Motivated by applications of bandit algorithms in education, we consider a stochastic multi-armed bandit problem with ε-contaminated rewards. We allow an adversary to give arbitrary unbounded contaminated rewards with full knowledge of the past and future. We impose the constraint that for each time t the proportion of contaminated rewards for any action is less than or equal to ε. We derive concentration inequalities for two robust mean estimators for sub-Gaussian distributions in the
arXiv:1910.05625v3
fatcat:ir3uek5ogrbttlamahslkp266q