Adversarial Attacks on Stochastic Bandits [article]

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu
2018 arXiv   pre-print
We study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two popular bandit algorithms: ϵ-greedy and UCB, without knowledge of the mean rewards. The attacker is able to spend only logarithmic effort, multiplied by a problem-specific parameter that becomes smaller as the bandit problem gets easier to attack. The result means the attacker can easily hijack the behavior of the
more » ... it algorithm to promote or obstruct certain actions, say, a particular medical treatment. As bandits are seeing increasingly wide use in practice, our study exposes a significant security threat.
arXiv:1810.12188v1 fatcat:kc5dt6pvb5g65aidbxw7botyze