A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
No Regret Bound for Extreme Bandits
[article]
2016
arXiv
pre-print
We then prove that no policy can asymptotically achieve no extreme regret. ...
We define a sensible notion of "extreme regret" in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting. ...
Acknowledgements We would like to thank Balázs Kégl for valuable discussions. We would like to thank Kevin Jamieson and Ilya Tolstikhin for their feedback on earlier drafts of this paper. ...
arXiv:1508.02933v3
fatcat:k3gdtps4zrafnnys7frdrq7vca
Algorithms for Linear Bandits on Polyhedral Sets
[article]
2015
arXiv
pre-print
We provide a lower bound for the expected regret that scales as Ω(N T). We then provide a nearly optimal algorithm and show that its expected regret scales as O(N^1+ϵ(T)) for an arbitrary small ϵ >0. ...
We also show that the regret upper bounds hold with probability 1. ...
Since OFU algorithms play only extremal points (arms), one may think that log T regret bounds can be attained for linear bandits by treating them as K-armed bandits, were K denotes the number of extremal ...
arXiv:1509.07927v1
fatcat:ki5hzuki5rfgjer74tgkl7ovwe
Extreme Bandits using Robust Statistics
[article]
2021
arXiv
pre-print
We show that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms. ...
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest. ...
Max-Median Algorithm for Extreme Bandits In this section, we provide a distribution-free algorithm/ policy for extreme bandits. ...
arXiv:2109.04433v1
fatcat:on47vcpxejbbflufcllaoozhdu
Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond
[chapter]
2017
Lecture Notes in Computer Science
This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. ...
We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014) , and next propose an alternative approach, showing that, remarkably, Extreme Bandits ...
Acknowledgments This work was supported by a public grant (Investissement d'avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH) and by the industrial chair Machine Learning for Big Data from Télécom ...
doi:10.1007/978-3-319-71246-8_24
fatcat:scxtewvgkvcund34egtfd26724
Bandit Market Makers
[article]
2013
arXiv
pre-print
We introduce a modular framework for market making. It combines cost-function based automated market makers with bandit algorithms. ...
This combination allow us to have distribution-free guarantees on the regret of profits while preserving the bounded worst-case losses and computational tractability over combinatorial spaces of the cost ...
To the best of our knowledge, there are no bandit algorithms for the multidimensional action space for which regret bounds have been obtained under adaptive adversaries. ...
arXiv:1112.0076v4
fatcat:dpegutgzjfg77d4ogwommwqjve
A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
[chapter]
2006
Lecture Notes in Computer Science
Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. ...
In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional ...
We present a new algorithm, Chernoff Interval Estimation, for the classical k-armed bandit problem and prove a bound on its regret. ...
doi:10.1007/11889205_40
fatcat:tltp22ni2bdkvgzlckstxwmk2a
Fair Algorithms for Infinite and Contextual Bandits
[article]
2017
arXiv
pre-print
We also analyze the previously-unstudied question of fairness in infinite linear bandit problems, obtaining instance-dependent regret upper bounds as well as lower bounds demonstrating that this instance-dependence ...
We study fairness in linear bandit problems. ...
For short, we refer to these as 1-bandit, m-bandit, and k-bandit.
Regret The notion of regret we will consider is that of pseudo-regret. ...
arXiv:1610.09559v4
fatcat:o2rd5zjnu5dgjpxh3vcvdvmszm
A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
[article]
2015
arXiv
pre-print
Finally, at the end of the paper, we present a table of known upper-bounds of regret for all studied algorithms providing both perspectives for future theoretical work and a decision-making tool for practitioners ...
We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments ...
As of this writing, there has been no known finite time analysis of regret for POKER. ...
arXiv:1510.00757v4
fatcat:eyxqdq3yl5fpdbv53wtnkfa25a
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility
[article]
2020
arXiv
pre-print
We propose Bandit and Threshold Tuning (BATT) to minimize the regret of handover failures in extreme mobility. ...
This paper formulates this trade-off in extreme mobility as a composition of two distinct multi-armed bandit problems. ...
This regret is an upper bound of the traditional regret, but is on the same order as it (that is to say, asymptotically, the two quantities will differ only by constant factor for the same policy). ...
arXiv:2010.15237v1
fatcat:jceqwpb2uzar7fk4jzfkv2ekly
Bandit Algorithms for Precision Medicine
[article]
2021
arXiv
pre-print
The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular ...
This chapter is written for quantitative researchers in fields such as statistics, machine learning, and operations research who might be interested in knowing more about the algorithmic and mathematical ...
Non-stationarity We have discussed two basic and extreme cases of bandit theory: stochastic bandit models and adversarial bandit models. ...
arXiv:2108.04782v1
fatcat:dni5wyzyerestgs3upuzz776n4
Contextual Bandits in a Collaborative Environment
2016
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16
We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. ...
This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components. ...
ACKNOWLEDGMENTS We thank the anonymous reviewers for their insightful comments. This paper is based upon work supported by the National Science Foundation under grant IIS-1553568.
REFERENCES ...
doi:10.1145/2911451.2911528
dblp:conf/sigir/WuWGW16
fatcat:4vrgi5cqezehjfl2upfua3tsn4
Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs
[article]
2021
arXiv
pre-print
We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs. ...
Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes. ...
bandits with extreme payoffs). ...
arXiv:2102.10085v2
fatcat:3r2cyu5enrhrra6pl6adsx6gk4
Compliance-Aware Bandits
[article]
2016
arXiv
pre-print
We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information. ...
Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret. ...
The regret bound for any bandit algorithm holds since the setting is the standard bandit setting. Protocol #2: Actual. ...
arXiv:1602.02852v1
fatcat:7qcbroaribb4bffzsmfq3ehv2i
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
[article]
2021
arXiv
pre-print
any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. ...
We show that the regret bound is near-optimal even with very heavy-tailed noise. ...
Acknowledgments The authors would like to thank anonymous reviewers for their valuable advice. ...
arXiv:2110.13876v1
fatcat:sj6t6frc3fhmjjpmidtupcn4f4
Advancements in Dueling Bandits
2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. ...
Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality ...
., 2015] as an algorithm with an optimal asymptotic regret bound, which improves upon the results for RUCB, since the regret bound for RUCB does not match the lower bound proven in [Komiyama et al., ...
doi:10.24963/ijcai.2018/776
dblp:conf/ijcai/SuiZHY18
fatcat:vfao6bpxt5aifbwyvtk3wg2cu4
« Previous
Showing results 1 — 15 out of 3,217 results