Filters








372 Hits in 2.9 sec

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation [article]

Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang
2020 arXiv   pre-print
Second, we propose a novel contextual combinatorial bandit method called UBM-LinUCB to address two issues related to positions by adopting the User Browsing Model (UBM), a click model for web search.  ...  Results on two CTR metrics show that our algorithm outperforms the other contextual bandit algorithms.  ...  DCM-LinUCB is an contextual bandit algorithm based on Dependent Click Model.  ... 
arXiv:2008.09368v1 fatcat:ua4aqkhhrvhb3lvljdvnp5pi3q

Automatic ad format selection via contextual bandits

Liang Tang, Romer Rosales, Ajit Singh, Deepak Agarwal
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
To balance exploration with exploitation, we pose automatic layout selection as a contextual bandit problem. There are many bandit algorithms, each generating a policy which must be evaluated.  ...  We describe the development of our offline replayer, and benchmark a number of common bandit algorithms.  ...  In this context, [7] presented an approach for modeling click response for different ad arrangement templates on a web-page.  ... 
doi:10.1145/2505515.2514700 dblp:conf/cikm/TangRSA13 fatcat:pd5ypbxtanb23otit7teyikpci

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Lihong Li, Wei Chu, John Langford, Xuanhui Wang
2011 Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11  
Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general.  ...  In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation.  ...  INTRODUCTION Web-based content recommendation services such as Digg, Yahoo! Buzz and Yahoo!  ... 
doi:10.1145/1935826.1935878 dblp:conf/wsdm/LiCLW11 fatcat:5c64oezojfdx3prp7hhb5whkty

Bandit Algorithms in Information Retrieval

Dorota Glowacka
2019 Foundations and Trends in Information Retrieval  
Dorota Głowacka (2019), "Bandit Algorithms in Information Retrieval", Foundations and Trends R in Information Retrieval: Vol. 13, No. 4, pp 299-424. DOI: 10.1561/1500000067.  ...  Chapter 3 summarizes bandit algorithms inspired by three click models: the Cascade Model (Section 3.1), the Dependent Click Model (Section 3.2) and the Position Based Model 3.3.  ...  "A contextual-bandit algorithm for mobile context-aware recommender system". In: International Conference on Neural Information Processing. Springer. 324-331. Bresler, G., G. H. Chen, and D.  ... 
doi:10.1561/1500000067 fatcat:api5ljs5abbwdckujtsgwp27o4

Offline Evaluation and Optimization for Interactive Systems

Lihong Li
2015 Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM '15  
data a warm-start policy better than random • Non-exploration data • 35M impressions for training • 19M impressions for test • 880K ads • 3.4M distinct webpages • ∈ {0,1}: click or not Three Algorithms  ...  to measure success Click metrics are hard to work with offline (b/c counterfactual nature) Standard solution is A/B test… but expensive Speller as Contextual Bandit S U • U context • S  ...  truly randomized data) • Cheap, and potentially useful • But risky (by ignoring potential confounding) • Need to design properly before collecting data How to Design Exploration Distributions (2) • depends  ... 
doi:10.1145/2684822.2697040 dblp:conf/wsdm/Li15 fatcat:2ap6hcpimfh5xogmdzif6ar6ri

Ensemble contextual bandits for personalized recommendation

Liang Tang, Yexi Jiang, Lei Li, Tao Li
2014 Proceedings of the 8th ACM Conference on Recommender systems - RecSys '14  
In this paper, we explore ensemble strategies of contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects.  ...  However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve  ...  ALGORITHM This section presents two ensemble bandit algorithms, Hy-perTS and HyperTSFB, for solving the contextual recommendation problem in the cold-start situation.  ... 
doi:10.1145/2645710.2645732 dblp:conf/recsys/TangJLL14 fatcat:flmgmd3zqjcbvbymhzmha23tfa

A contextual-bandit approach to personalized news article recommendation

Lihong Li, Wei Chu, John Langford, Robert E. Schapire
2010 Proceedings of the 19th international conference on World wide web - WWW '10  
In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based  ...  First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory.  ...  CONCLUSIONS This paper takes a contextual-bandit approach to personalized web-based services such as news article recommendation.  ... 
doi:10.1145/1772690.1772758 dblp:conf/www/LiCLS10 fatcat:76lesokjgzc7vkl4sw5xr44ddy

Personalized Recommendation via Parameter-Free Contextual Bandits

Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, Tao Li
2015 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15  
In this paper, we formulate personalized recommendation as a contextual bandit problem to solve the exploration/exploitation dilemma.  ...  click-through rate.  ...  The reward is the user response (e.g., a click). Therefore, personalized recommendation can be seen as an instance of the contextual bandit problem.  ... 
doi:10.1145/2766462.2767707 dblp:conf/sigir/TangJLZL15 fatcat:gyuie4kilngm5draxc7tivnwlq

A Map of Bandits for E-commerce [article]

Yi Liu, Lihong Li
2021 arXiv   pre-print
The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand.  ...  In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms.  ...  The reward depends on and . The objective of a Bandit algorithm is to recommend actions for each step to maximize the expected cumulative reward over time: =1 [ ] where is the total number of steps.  ... 
arXiv:2107.00680v1 fatcat:7gl37h4yrrbfhdy4q5eyk7usbq

Accelerated learning from recommender systems using multi-armed bandit [article]

Meisam Hejazinia, Kyler Eastman, Shuqin Ye, Abbas Amirabadi, Ravi Divvela
2019 arXiv   pre-print
The gold standard for evaluating recommendation algorithms has been the A/B test since it is an unbiased way to estimate how well one or more algorithms compare in the real world.  ...  We argue that multi armed bandit (MAB) testing as a solution to these issues.  ...  ACKNOWLEDGMENTS The authors would like to thank Travis Brady, Pavlos Mitsoulis Ntompos, Ben Dundee, Kurt Smith, and John Meakin for their internal review of this paper and their helpful feedback.  ... 
arXiv:1908.06158v1 fatcat:7rp3l5ea25feliymdm6cyeuska

Query Completion Using Bandits for Engines Aggregation [article]

Audrey Durand, Jean-Alexandre Beaumont, Christian Gagne, Michel Lemay, Sebastien Paquet
2017 arXiv   pre-print
We tackle this problem under the bandits setting and evaluate four strategies to overcome this challenge.  ...  There are many possible strategies for query auto-completion and a challenge is to design one optimal engine that considers and uses all available information.  ...  Thanks to Coveo for providing the data and the infrastructure and to the Natural Sciences and Engineering Research Council of Canada (NSERC) for the research grant EGP 492531-15.  ... 
arXiv:1709.04095v1 fatcat:n4skx7ld2zcxlpxmv6vg3mddry

Reinforcement Learning for Online Information Seeking [article]

Xiangyu Zhao and Long Xia and Jiliang Tang and Dawei Yin
2019 arXiv   pre-print
In this paper, we give an overview of deep reinforcement learning for search, recommendation, and online advertising from methodologies to applications, review representative algorithms, and discuss some  ...  Search, recommendation, and online advertising are the three most important information-providing mechanisms on the web.  ...  sequentially for specific users based on the users' and articles' contextual information, in order to maximize the total user clicks.  ... 
arXiv:1812.07127v4 fatcat:pyc75g5hufcs5b3f75gonbkp24

Introduction to Multi-Armed Bandits

Aleksandrs Slivkins
2019 Foundations and Trends® in Machine Learning  
Reward (e.g.) medical trials which drug to prescribe healthy/not. web design font color or page layout #clicks. web content items/articles to emphasize #clicks. web search search results given a query  ...  ., depends only on the chosen arm), but not known to the algorithm.  ... 
doi:10.1561/2200000068 fatcat:5drse7hks5fuzd6hriwrlgp27a

Contextual Bandit Applications in Customer Support Bot [article]

Sandra Sajeev, Jade Huang, Nikos Karampatziakis, Matthew Hall, Sebastian Kochman, Weizhu Chen
2021 arXiv   pre-print
It includes intent disambiguation based on neural-linear bandits (NLB) and contextual recommendations based on a collection of multi-armed bandits (MAB).  ...  Adaptable learning techniques, like contextual bandits, are a natural fit for this problem setting.  ...  Lessons from Contextual Bandit Learning in a tion and contextual recommendations for a virtual support agent.  ... 
arXiv:2112.03210v1 fatcat:m4eig5htorge7oko3xgkfs3ffe

DCM Bandits: Learning to Rank with Multiple Clicks [article]

Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen
2016 arXiv   pre-print
We propose DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages.  ...  This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.  ...  DCM Bandits We propose a learning variant of the dependent click model (Section 3.1) and a computationally-efficient algorithm for solving it (Section 3.3).  ... 
arXiv:1602.03146v2 fatcat:yixxwteyn5ha5ayfugrf62juta
« Previous Showing results 1 — 15 out of 372 results