Filters








137 Hits in 6.6 sec

Dynamic allocation optimization in A/B tests using classification-based preprocessing

Emmanuelle Claeys, Pierre Gancarski, Myriam Maumy-Bertrand, Hubert Wassner
2021 IEEE Transactions on Knowledge and Data Engineering  
However, one problem with this approach is the non-adaptivity of the test.  ...  One way to avoid this is to apply a bandit-based algorithm. Such an algorithm is able to automatically decides if a page should be chosen and applied more often than the other one.  ...  The bandit model The notion of bandit was introduced by Lai and Robbins, [26] under the name of multi bandit but we use here only the term bandit to simplify the reading.  ... 
doi:10.1109/tkde.2021.3076025 fatcat:l6ifydtzdfgyjfyupwhajtit7a

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R [article]

Robin van Emden, Maurits Kaptein
2020 arXiv   pre-print
Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance  ...  , easily extensible framework that facilitates parallelized comparison of contextual and context-free bandit policies through both simulation and offline analysis.  ...  , as of yet, no extensible and widely applicable R package to analyze and compare, respectively, basic multi-armed, continuum (Agrawal 1995) and contextual multi-armed bandit algorithms on both simulated  ... 
arXiv:1811.01926v4 fatcat:7im2nngh7jb2zk4vboyicqatra

Rarely-switching linear bandits: optimization of causal effects for the real world [article]

Benjamin Lansdell, Sofia Triantafillou, Konrad Kording
2019 arXiv   pre-print
In cases that a policy is a threshold on contextual variables we can estimate treatment effects for populations lying at the threshold.  ...  Using this idea, and the theory of linear contextual bandits, we present a conservative policy updating procedure which updates a deterministic policy only when justified.  ...  URL http: //arxiv.org/abs/1611.06426. Tomer Koren, Roi Livni, and Yishay Mansour. Multi- Armed Bandits with Metric Movement Costs. Advances in Neural Information Processing Systems, 30, 2017a.  ... 
arXiv:1905.13121v2 fatcat:6gr5zdi27zcwdgdkqaohbsiiti

An Empirical Study of Neural Kernel Bandits [article]

Michal Lisicki, Arash Afkanpour, Graham W. Taylor
2021 arXiv   pre-print
While in general contextual bandits commonly utilize Gaussian process (GP) predictive distributions for decision making, the most successful neural variants use only the last layer parameters in the derivation  ...  Neural bandits have enabled practitioners to operate efficiently on problems with non-linear reward functions.  ...  The authors thank the Canada Foundation for Innovation and Compute Canada for computing resources.  ... 
arXiv:2111.03543v1 fatcat:lmewgppoxzelxenjr2hk3itxym

Top-k eXtreme Contextual Bandits with Arm Hierarchy [article]

Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon
2021 arXiv   pre-print
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms.  ...  Motivated by modern applications, such as online advertisement and recommender systems, we study the top-k extreme contextual bandits problem, where the total number of arms can be enormous, and the learner  ...  . • We implement our eXtreme contextual bandit algorithm with a hierarchical linear function class and test the performance of different exploration strategies under our framework on eXtreme multi-label  ... 
arXiv:2102.07800v1 fatcat:fkojyurufvhjphnie5m52qzcym

Multi-Armed Bandits with Correlated Arms [article]

Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan
2020 arXiv   pre-print
We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.  ...  Rigorous analysis of C-UCB and C-TS (the correlated bandit versions of Upper-confidence-bound and Thompson sampling) reveals that the algorithms end up pulling certain sub-optimal arms, termed as non-competitive  ...  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting.  ... 
arXiv:1911.03959v3 fatcat:rh5e7xou4rdypbbzxuguxesnwq

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits [article]

Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu
2021 arXiv   pre-print
We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model.  ...  Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order  ...  Zhang, The Epoch-Greedy Algorithm for Contextual Multi-armed Ban- dits, in 21st Advances in Neural Information Processing Systems (NIPS), 2007. [22] T. Lattimore and C.  ... 
arXiv:2005.10624v2 fatcat:qc6glfbul5b35imusb6qh3q5wu

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [article]

Dalin Guo, Sofia Ira Ktena, Ferenc Huszar, Pranay Kumar Myana, Wenzhe Shi, Alykhan Tejani
2020 arXiv   pre-print
We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.  ...  In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in  ...  The exploration/exploitation trade-off is naturally formulated as a (contextual) multi-armed bandit task, for which an ϵ-greedy policy is a simple yet powerful approach.  ... 
arXiv:2008.00727v1 fatcat:eu5rpcg6abhmxlmg4ewnbi7zga

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting [article]

Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan
2021 arXiv   pre-print
We prove via regret analysis that our proposed UCB-C algorithm (structured bandit versions of UCB) pulls only a subset of the sub-optimal arms O(log T) times while the other sub-optimal arms (referred  ...  This approach enables us to fundamentally generalize any classic bandit algorithm including UCB and Thompson Sampling to the structured bandit setting.  ...  any other method designed for classical multi-armed bandits with independent arms.  ... 
arXiv:1810.08164v7 fatcat:ftxpgbxifbemfj2amojqcsthfy

An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits [article]

Kei Takemura, Shinji Ito
2019 arXiv   pre-print
Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors  ...  Our empirical evaluation with artificial and real-world datasets demonstrates that the proposed algorithms with the arm-wise randomization technique outperform the existing algorithms without this technique  ...  Jain, “Combinatorial network op- 5000000 timization with unknown variables: Multi-armed bandits with linear  ... 
arXiv:1909.02251v2 fatcat:nwsweuaetbcmtplcvli4unr4ru

Introduction to Multi-Armed Bandits [article]

Aleksandrs Slivkins
2022 arXiv   pre-print
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty.  ...  The next three chapters cover adversarial rewards, from the full-feedback version to adversarial bandits to extensions with linear rewards and combinatorially structured actions.  ...  While these two lower bounds essentially resolve the basic version of multi-armed bandits, they do not suffice for many other versions.  ... 
arXiv:1904.07272v7 fatcat:pptyhyyshrdyhhf7bdonz5dsv4

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start Users [article]

Shijun Li, Wenqiang Lei, Qingyun Wu, Xiangnan He, Peng Jiang, Tat-Seng Chua
2021 arXiv   pre-print
., multi-armed bandit approach, addresses this limitation by interactively exploring user preference online and pursuing the exploration-exploitation (EE) trade-off.  ...  Our Conversational Thompson Sampling (ConTS) model holistically solves all questions in conversational recommendation by choosing the arm with the maximal reward to play.  ...  However, bandit algorithms only work in small arm pools, which usually requires a separate pre-processing on the candidate pool [6, 29] .  ... 
arXiv:2005.12979v4 fatcat:m4l54jeco5cdtd4vnxtfhicoxy

Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods [article]

Samuel Kessler, Arnold Salas, Vincent W. C. Tan, Stefan Zohren, Stephen Roberts
2020 arXiv   pre-print
We also demonstrate the quality of the derived uncertainty measures by comparing the performance of Badam to standard methods in a Thompson sampling setting for multi-armed bandits, where good uncertainty  ...  We introduce a novel framework for the estimation of the posterior distribution over the weights of a neural network, based on a new probabilistic interpretation of adaptive optimisation algorithms such  ...  The experimental setup for the multi-armed bandits proceeds as follows: at each round a new context from the dataset is presented to the bandit algorithm, we go through the dataset once.  ... 
arXiv:1811.03679v3 fatcat:f4kjgcns4jghxicddjtggxcbx4

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability [article]

David Simchi-Levi, Yunzong Xu
2021 arXiv   pre-print
This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.  ...  We design a fast and simple algorithm that achieves the statistically optimal regret with only O(log T) calls to an offline regression oracle across all T rounds.  ...  Acknowledgments The authors would like to thank the review team for their helpful comments; in particular, for pointing out some observations in §3.2.  ... 
arXiv:2003.12699v5 fatcat:ylydspt4d5b4vf2br7o5ugjewi

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach [article]

Chen-Yu Wei, Haipeng Luo
2021 arXiv   pre-print
By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized  ...  algorithms, but also significantly improves the state of the art for (generalized) linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways.  ...  us their non-stationary linear bandit algorithm with the Reg ∆ bound [Cheung et al., 2018] .  ... 
arXiv:2102.05406v3 fatcat:ur57whvga5cvfhvxyoyfctxhqy
« Previous Showing results 1 — 15 out of 137 results