3,217 Hits in 3.4 sec

No Regret Bound for Extreme Bandits [article]

Robert Nishihara, David Lopez-Paz, Léon Bottou
2016 arXiv   pre-print
We then prove that no policy can asymptotically achieve no extreme regret.  ...  We define a sensible notion of "extreme regret" in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting.  ...  Acknowledgements We would like to thank Balázs Kégl for valuable discussions. We would like to thank Kevin Jamieson and Ilya Tolstikhin for their feedback on earlier drafts of this paper.  ... 
arXiv:1508.02933v3 fatcat:k3gdtps4zrafnnys7frdrq7vca

Algorithms for Linear Bandits on Polyhedral Sets [article]

Manjesh K. Hanawal and Amir Leshem and Venkatesh Saligrama
2015 arXiv   pre-print
We provide a lower bound for the expected regret that scales as Ω(N T). We then provide a nearly optimal algorithm and show that its expected regret scales as O(N^1+ϵ(T)) for an arbitrary small ϵ >0.  ...  We also show that the regret upper bounds hold with probability 1.  ...  Since OFU algorithms play only extremal points (arms), one may think that log T regret bounds can be attained for linear bandits by treating them as K-armed bandits, were K denotes the number of extremal  ... 
arXiv:1509.07927v1 fatcat:ki5hzuki5rfgjer74tgkl7ovwe

Extreme Bandits using Robust Statistics [article]

Sujay Bhatt, Ping Li, Gennady Samorodnitsky
2021 arXiv   pre-print
We show that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms.  ...  We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest.  ...  Max-Median Algorithm for Extreme Bandits In this section, we provide a distribution-free algorithm/ policy for extreme bandits.  ... 
arXiv:2109.04433v1 fatcat:on47vcpxejbbflufcllaoozhdu

Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond [chapter]

Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade
2017 Lecture Notes in Computer Science  
This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold.  ...  We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014) , and next propose an alternative approach, showing that, remarkably, Extreme Bandits  ...  Acknowledgments This work was supported by a public grant (Investissement d'avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH) and by the industrial chair Machine Learning for Big Data from Télécom  ... 
doi:10.1007/978-3-319-71246-8_24 fatcat:scxtewvgkvcund34egtfd26724

Bandit Market Makers [article]

Nicolas Della Penna, Mark D. Reid
2013 arXiv   pre-print
We introduce a modular framework for market making. It combines cost-function based automated market makers with bandit algorithms.  ...  This combination allow us to have distribution-free guarantees on the regret of profits while preserving the bounded worst-case losses and computational tractability over combinatorial spaces of the cost  ...  To the best of our knowledge, there are no bandit algorithms for the multidimensional action space for which regret bounds have been obtained under adaptive adversaries.  ... 
arXiv:1112.0076v4 fatcat:dpegutgzjfg77d4ogwommwqjve

A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem [chapter]

Matthew J. Streeter, Stephen F. Smith
2006 Lecture Notes in Computer Science  
Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions.  ...  In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional  ...  We present a new algorithm, Chernoff Interval Estimation, for the classical k-armed bandit problem and prove a bound on its regret.  ... 
doi:10.1007/11889205_40 fatcat:tltp22ni2bdkvgzlckstxwmk2a

Fair Algorithms for Infinite and Contextual Bandits [article]

Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth
2017 arXiv   pre-print
We also analyze the previously-unstudied question of fairness in infinite linear bandit problems, obtaining instance-dependent regret upper bounds as well as lower bounds demonstrating that this instance-dependence  ...  We study fairness in linear bandit problems.  ...  For short, we refer to these as 1-bandit, m-bandit, and k-bandit. Regret The notion of regret we will consider is that of pseudo-regret.  ... 
arXiv:1610.09559v4 fatcat:o2rd5zjnu5dgjpxh3vcvdvmszm

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit [article]

Giuseppe Burtini, Jason Loeppky, Ramon Lawrence
2015 arXiv   pre-print
Finally, at the end of the paper, we present a table of known upper-bounds of regret for all studied algorithms providing both perspectives for future theoretical work and a decision-making tool for practitioners  ...  We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments  ...  As of this writing, there has been no known finite time analysis of regret for POKER.  ... 
arXiv:1510.00757v4 fatcat:eyxqdq3yl5fpdbv53wtnkfa25a

Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility [article]

Yuanjie Li, Esha Datta, Jiaxin Ding, Ness Shroff, Xin Liu
2020 arXiv   pre-print
We propose Bandit and Threshold Tuning (BATT) to minimize the regret of handover failures in extreme mobility.  ...  This paper formulates this trade-off in extreme mobility as a composition of two distinct multi-armed bandit problems.  ...  This regret is an upper bound of the traditional regret, but is on the same order as it (that is to say, asymptotically, the two quantities will differ only by constant factor for the same policy).  ... 
arXiv:2010.15237v1 fatcat:jceqwpb2uzar7fk4jzfkv2ekly

Bandit Algorithms for Precision Medicine [article]

Yangyi Lu, Ziping Xu, Ambuj Tewari
2021 arXiv   pre-print
The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular  ...  This chapter is written for quantitative researchers in fields such as statistics, machine learning, and operations research who might be interested in knowing more about the algorithmic and mathematical  ...  Non-stationarity We have discussed two basic and extreme cases of bandit theory: stochastic bandit models and adversarial bandit models.  ... 
arXiv:2108.04782v1 fatcat:dni5wyzyerestgs3upuzz776n4

Contextual Bandits in a Collaborative Environment

Qingyun Wu, Huazheng Wang, Quanquan Gu, Hongning Wang
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms.  ...  This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their insightful comments. This paper is based upon work supported by the National Science Foundation under grant IIS-1553568. REFERENCES  ... 
doi:10.1145/2911451.2911528 dblp:conf/sigir/WuWGW16 fatcat:4vrgi5cqezehjfl2upfua3tsn4

Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs [article]

Yibo Yang, Antoine Blanchard, Themistoklis Sapsis, Paris Perdikaris
2021 arXiv   pre-print
We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs.  ...  Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.  ...  bandits with extreme payoffs).  ... 
arXiv:2102.10085v2 fatcat:3r2cyu5enrhrra6pl6adsx6gk4

Compliance-Aware Bandits [article]

Nicolás Della Penna, Mark D. Reid, David Balduzzi
2016 arXiv   pre-print
We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information.  ...  Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret.  ...  The regret bound for any bandit algorithm holds since the setting is the standard bandit setting. Protocol #2: Actual.  ... 
arXiv:1602.02852v1 fatcat:7qcbroaribb4bffzsmfq3ehv2i

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs [article]

Han Zhong, Jiayi Huang, Lin F. Yang, Liwei Wang
2021 arXiv   pre-print
any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian.  ...  We show that the regret bound is near-optimal even with very heavy-tailed noise.  ...  Acknowledgments The authors would like to thank anonymous reviewers for their valuable advice.  ... 
arXiv:2110.13876v1 fatcat:sj6t6frc3fhmjjpmidtupcn4f4

Advancements in Dueling Bandits

Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form.  ...  Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality  ...  ., 2015] as an algorithm with an optimal asymptotic regret bound, which improves upon the results for RUCB, since the regret bound for RUCB does not match the lower bound proven in [Komiyama et al.,  ... 
doi:10.24963/ijcai.2018/776 dblp:conf/ijcai/SuiZHY18 fatcat:vfao6bpxt5aifbwyvtk3wg2cu4
« Previous Showing results 1 — 15 out of 3,217 results