Filters








900 Hits in 2.4 sec

Advancements in Dueling Bandits

Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem.  ...  The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form.  ...  In this survey, we overview recent advances in research on dueling bandits. For a thorough review of early work on dueling bandits, we refer readers to [Busa-Fekete and Hüllermeier, 2014] .  ... 
doi:10.24963/ijcai.2018/776 dblp:conf/ijcai/SuiZHY18 fatcat:vfao6bpxt5aifbwyvtk3wg2cu4

KLUCB Approach to Copeland Bandits [article]

Nischal Agrawal, Prasanna Chaporkar
2019 arXiv   pre-print
Previous UCB algorithms such as Relative Upper Confidence Bound(RUCB) can be applied only in case of Condorcet dueling bandits, whereas this algorithm applies to general Copeland dueling bandits, including  ...  Our empirical results outperform state of the art Double Thompson Sampling(DTS) in case of Copeland dueling bandits.  ...  Dueling Bandits Problem We consider a K armed dueling bandit problem such that K ≥ 2 and finite. We define A = {1, 2, ..., K} as set of arms. Time proceeds in round indexed by n = 1, 2, ..., T .  ... 
arXiv:1902.02778v1 fatcat:pcwyxym45bgcdfjpqtord3a6wm

Online Learning of Visualization Preferences through Dueling Bandits for Enhancing Visualization Recommendations

Jan-Frederik Kassel, Michael Rohs
2019 Eurographics Conference on Visualization  
In order to close this gap we explore online learning of visualization preferences through dueling bandits. Additionally, we consider this challenge from a usability perspective.  ...  While our findings affirm the applicability of dueling bandits, they further provide insights on both the needed training time in order to achieve a usability-aligned procedure and the generalizability  ...  In our case, the dueling bandit shows two visualizations to the user each in accordance with its exploration strategy.  ... 
doi:10.2312/evs.20191175 dblp:conf/vissym/KasselR19 fatcat:df545zy5kncx3lce2ol5ai2tnq

Regret Minimization in Stochastic Contextual Dueling Bandits [article]

Aadirupa Saha, Aditya Gopalan
2021 arXiv   pre-print
However, unlike the classical contextual bandit setup, our framework only allows the learner to receive item feedback in terms of their (noisy) pariwise preferences–famously studied as dueling bandits  ...  We consider the problem of stochastic K-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of K items, each represented by a d-dimensional feature  ...  Efficient learning by implicit exploration in bandit problems with side observations. In Advances in Neural Information Processing Systems, pages 613-621, 2014.  ... 
arXiv:2002.08583v2 fatcat:nxobgxl5jnfa7gx6hzt7qvrei4

Reducing Dueling Bandits to Cardinal Bandits [article]

Nir Ailon and Thorsten Joachims and Zohar Karnin
2014 arXiv   pre-print
Bandit algorithms to the Dueling Bandits setting.  ...  We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem.  ...  This research was funded in part by NSF Awards IIS-1217686 and IIS-1247696, a Marie Curie Reintegration Grant PIRG07-GA-2010-268403, an Israel Science Foundation grant 1271/33 and a Jacobs Technion-Cornell  ... 
arXiv:1405.3396v1 fatcat:l3jo432ld5fojbkksoydxgwrd4

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem [article]

Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke
2013 arXiv   pre-print
This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.  ...  In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.  ...  the continuous dueling bandit setting, without a convexity assumption as in (Yue & Joachims, 2009 ).  ... 
arXiv:1312.3393v2 fatcat:q64qmlbtafeydal7xwfm3udmeq

Double Thompson Sampling for Dueling Bandits [article]

Huasen Wu, Xin Liu
2016 arXiv   pre-print
In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems.  ...  This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as its special case.  ...  In practice, we may not know in advance whether we have a Condorcet and non-Condorcet dueling bandit. We may have in practice a time-varying system and delayed feedback.  ... 
arXiv:1604.07101v2 fatcat:l7mooquranbzpohyqblnwhe5ie

Factored Bandits [article]

Julian Zimmert, Yevgeny Seldin
2018 arXiv   pre-print
Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits.  ...  We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of  ...  Dueling bandits The set of actions in dueling bandits is factored into AˆA.  ... 
arXiv:1807.01488v2 fatcat:h27ktgcjbjfhzhk6qezvyjyiwq

Bandit Algorithms in Information Retrieval

Dorota Glowacka
2019 Foundations and Trends in Information Retrieval  
Dorota Głowacka (2019), "Bandit Algorithms in Information Retrieval", Foundations and Trends R in Information Retrieval: Vol. 13, No. 4, pp 299-424. DOI: 10.1561/1500000067.  ...  Chapter 5 focuses mostly on dueling bandits algorithms and their application to ranking. In Chapter 6, various bandit approaches used in recommender systems are described.  ...  In: Advances in Neural Information Processing Systems. 3347-3355. Brost, B., I. J. Cox, Y. Seldin, and C. Lioma. 2016a. "An improved multileaving algorithm for online ranker evaluation".  ... 
doi:10.1561/1500000067 fatcat:api5ljs5abbwdckujtsgwp27o4

Verification Based Solution for Structured MAB Problems

Zohar S. Karnin
2016 Neural Information Processing Systems  
, Copeland dueling bandits, Unimodal bandits and Graphical bandits.  ...  We demonstrate the effectiveness of our framework by applying it, and matching or improving the state-of-the art results in the problems of: Linear bandits, Dueling bandits with the Condorcet assumption  ...  We demonstrated the effectiveness of our framework by improving the state-of-the-art results in several MAB problems.  ... 
dblp:conf/nips/Karnin16 fatcat:vawtcytaivfnthkqwehobbmtaq

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal
2016 Neural Information Processing Systems  
We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(log T ) anytime regret bounds for them.  ...  Recent work on deriving O(log T ) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by  ...  which the algorithm does not need to know the horizon or number of trials T in advance.  ... 
dblp:conf/nips/RamamohanR016 fatcat:2chzo7zygncyjomkqbc3euiadm

Cluster Based Deep Contextual Reinforcement Learning for top-k Recommendations [article]

Anubha Kabra, Anu Agarwal, Anil Singh Parihar
2020 arXiv   pre-print
Rapid advancements in the E-commerce sector over the last few decades have led to an imminent need for personalised, efficient and dynamic recommendation systems.  ...  The Duelling Bandit based exploration provides robust exploration as compared to the state-of-art strategies due to its adaptive nature.  ...  The Green Point Dueling Bandit Gradient Descent Dueling Bandit Gradient Descent is an adaptive exploration strategy unlike epsilon greedy, UCB, TS etc.  ... 
arXiv:2012.02291v1 fatcat:ak7qbf6ngve3hlz6ozuvz3wq4y

Preference-based Online Learning with Dueling Bandits: A Survey [article]

Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier
2021 arXiv   pre-print
The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits.  ...  In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives  ...  dueling bandits algorithm for the cumulative regret in (2).  ... 
arXiv:1807.11398v2 fatcat:jsu6gap5pbgbtm735fgf4aqwmu

Preference-based Online Learning with Dueling Bandits: A Survey

Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier
2021 Journal of machine learning research  
The aim of this paper is to provide a survey of the state of the art in this field, referred to as preferencebased multi-armed bandits or dueling bandits.  ...  In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives  ...  dueling bandits algorithm for the cumulative regret in (2).  ... 
dblp:journals/jmlr/BengsBMH21 fatcat:mdxi3bzymrb37ckbxbh27pu6f4

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm [article]

Junpei Komiyama, Junya Honda, Hiroshi Nakagawa
2016 arXiv   pre-print
We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms.  ...  Experimental comparisons of dueling bandit algorithms show that ECW-RMED significantly outperforms existing ones.  ...  Acknowledgements This work was supported in part by JSPS KAKENHI Grant Number 15J09850 and 16H00881.  ... 
arXiv:1605.01677v2 fatcat:vmgfq7rhz5hunhbh7iikyyk5eu
« Previous Showing results 1 — 15 out of 900 results