A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
[article]
2022
arXiv
pre-print
We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. In particular, we give the first best-of-both world result
arXiv:2202.06694v1
fatcat:hd2j4clntzafzhcdjsndsek3gq