Filters








626 Hits in 7.6 sec

Top K Ranking for Multi-Armed Bandit with Noisy Evaluations [article]

Evrard Garcelon and Vashist Avadhanula and Alessandro Lazaric and Matteo Pirotta
2022 arXiv   pre-print
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, evaluations of the true reward of each arm and it selects K  ...  arms with the objective of accumulating as much reward as possible over T rounds.  ...  Multi-feedback bandit learning with probabilistic contexts.  ... 
arXiv:2112.06517v4 fatcat:q4spwhtf3fezrpxr5auutvguma

Exploiting Class Learnability in Noisy Data

Matthew Klawonn, Eric Heim, James Hendler
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data.  ...  Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with  ...  First, we begin with some preliminaries of Multi-Armed Bandit algorithms.  ... 
doi:10.1609/aaai.v33i01.33014082 fatcat:5f3hq2ddoff4dmdkqtofw23nja

Exploiting Class Learnability in Noisy Data [article]

Matthew Klawonn, Eric Heim, James Hendler
2018 arXiv   pre-print
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data.  ...  Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with  ...  First, we begin with some preliminaries of Multi-Armed Bandit algorithms.  ... 
arXiv:1811.06524v1 fatcat:bkxo5dxjrvd2jle23wxuri7tkq

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach [article]

Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen
2022 arXiv   pre-print
In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm.  ...  Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo.  ...  Results of Discussions This work studied the problem of multi-agent evaluation with Elo ratings.  ... 
arXiv:2201.04480v2 fatcat:cvyei5mgj5firl3fk2z5bc62r4

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation [article]

Brian Brost and Yevgeny Seldin and Ingemar J. Cox and Christina Lioma
2016 arXiv   pre-print
New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers.  ...  Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons.  ...  In multi-dueling bandits, at each iteration, t, an algorithm selects a subset, St, of K arms and observes outcomes of noisy pairwise comparisons (duels) between all pairs of arms in St.  ... 
arXiv:1608.06253v1 fatcat:pvzmbibkd5gtfem4vsafzkfs64

Preference-based Online Learning with Dueling Bandits: A Survey [article]

Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier
2021 arXiv   pre-print
This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction  ...  The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits.  ...  We would also like to thank two anonymous referees for their valuable comments and suggestions, which helped to significantly improve this survey.  ... 
arXiv:1807.11398v2 fatcat:jsu6gap5pbgbtm735fgf4aqwmu

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma
2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16  
It can be modeled by dueling bandits, a mathematical model for online learning under limited feedback from pairwise comparisons.  ...  The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration. Methods for simultaneously comparing more than two rankers have recently been developed.  ...  THE MULTI-DUELING BANDIT PROB-LEM In multi-dueling bandits, at each iteration, t, an algorithm selects a subset, St, of K arms and observes outcomes of noisy pairwise comparisons (duels) between all pairs  ... 
doi:10.1145/2983323.2983659 dblp:conf/cikm/BrostSCL16 fatcat:lopvamobzjdrbnwxvhoktwuguq

MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation [article]

Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi
2020 arXiv   pre-print
It is captured by the K-armed dueling bandit problem, which is a variant of the K-armed bandit problem, where the feedback comes in the form of pairwise preferences.  ...  Today's deployed search systems can evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers is a critical aspect of K-armed dueling bandit problems.  ...  We also thank our editor and the anonymous reviewers for extensive comments and suggestions that helped us to improve the paper.  ... 
arXiv:1812.04412v2 fatcat:jgtm6ukpknh3ppmsjyhmfkvh7u

A Survey of Preference-Based Online Learning with Bandit Algorithms [chapter]

Róbert Busa-Fekete, Eyke Hüllermeier
2014 Lecture Notes in Computer Science  
This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction  ...  The aim of this paper is to provide a survey of the state-of-the-art in this field, that we refer to as preference-based multi-armed bandits.  ...  The authors are grateful for financial support by the German Research Foundation (DFG).  ... 
doi:10.1007/978-3-319-11662-4_3 fatcat:lbgfw6q77vakfpxok2voujro2a

Advancements in Dueling Bandits

Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality  ...  The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form.  ...  ., 2015] proposes a new structural assumption for the K-armed dueling bandits problem in which the top arms can be distinguished by duels with a sparse set of other arms.  ... 
doi:10.24963/ijcai.2018/776 dblp:conf/ijcai/SuiZHY18 fatcat:vfao6bpxt5aifbwyvtk3wg2cu4

Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment [article]

Govinda M. Kamath and Tavor Z. Baharav and Ilan Shomorony
2021 arXiv   pre-print
The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments.  ...  The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models, where the workers' responses correspond to k-mer hash collisions  ...  noisy data, and multi-armed bandits.  ... 
arXiv:2011.04832v2 fatcat:ubtgbpjscvaino323n2v4jkmfq

Objective Social Choice: Using Auxiliary Information to Improve Voting Outcomes [article]

Silviu Pitis, Michael R. Zhang
2020 arXiv   pre-print
In our present work, we: (1) define our problem and argue that it reflects common and socially relevant real world scenarios, (2) propose a multi-arm bandit noise model and count-based auxiliary information  ...  set, (3) derive maximum likelihood aggregation rules for ranked and cardinal votes under our noise model, (4) propose, alternatively, to learn an aggregation rule using an order-invariant neural network  ...  ACKNOWLEDGMENTS We thank Nisarg Shah for his guidance throughout this project. We also thank Jimmy Ba, Harris Chan, Mufan Li and the anonymous referees for their helpful comments.  ... 
arXiv:2001.10092v1 fatcat:tmwqmvfrfra5rhuzrcr5fwm454

MaxGap Bandit: Adaptive Algorithms for Approximate Ranking [article]

Sumeet Katariya, Ardhendu Tripathy, Robert Nowak
2019 arXiv   pre-print
This problem arises naturally in approximate ranking, noisy sorting, outlier detection, and top-arm identification in bandits.  ...  This paper studies the problem of adaptively sampling from K distributions (arms) in order to identify the largest gap between any two adjacent means. We call this the MaxGap-bandit problem.  ...  This model encompasses many problems including best-arms identification in multi-armed bandits, noisy sorting and ranking, and outlier detection.  ... 
arXiv:1906.00547v1 fatcat:a6onwev5orc7re3q476rax2uqq

Sparse Dueling Bandits [article]

Kevin Jamieson, Sumeet Katariya, Atul Deshpande, Robert Nowak
2015 arXiv   pre-print
The dueling bandit problem is a variation of the classical multi-armed bandit in which the allowable actions are noisy comparisons between pairs of arms.  ...  This paper focuses on a new approach for finding the "best" arm according to the Borda criterion using noisy comparisons.  ...  INTRODUCTION The dueling bandit is a variation of the classic multi-armed bandit problem in which the actions are noisy comparisons between arms, rather than observations from the arms themselves (Yue  ... 
arXiv:1502.00133v1 fatcat:shpx27jh5na2rhvmvgpg2lpld4

Multi-dueling Bandits with Dependent Arms [article]

Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue
2017 arXiv   pre-print
In this paper, we study the problem of multi-dueling bandits with dependent arms, which extends the original dueling bandits setting by simultaneously dueling multiple arms as well as modeling dependencies  ...  We propose the \selfsparring algorithm, which reduces the multi-dueling bandits problem to a conventional bandit setting that can be solved using a stochastic bandit algorithm such as Thompson Sampling  ...  Multi-Dueling Bandits Experiments We next evaluate the multi-dueling setting with independent arms.  ... 
arXiv:1705.00253v1 fatcat:6yynr7sxsfbgbowb2qplhibtuy
« Previous Showing results 1 — 15 out of 626 results