Filters








886 Hits in 4.0 sec

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance [article]

Xinyu Zhang, Srinjoy Das, Ken Kreutz-Delgado
2021 arXiv   pre-print
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance  ...  "Distance tuning" of the standard UCB is done using a proposed distance measure, which we call bandit distance, that is parameterizable and which therefore can be optimized to control the transition rate  ...  . • We propose UCB-DT policy, which tunes confidence bound by bandit distance.  ... 
arXiv:2110.02690v1 fatcat:tzst3wjevbcyxevkkbokoqd37y

Bayesian Unification of Gradient and Bandit-Based Learning for Accelerated Global Optimisation

Ole-Christoffer Granmo
2016 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)  
However, for continuous optimisation problems or problems with a large number of actions, bandit based approaches can be hindered by slow learning.  ...  We further propose an accompanying bandit driven exploration scheme that uses Bayesian credible bounds to trade off exploration against exploitation.  ...  setting: Thompson sampling (stochastic probability matching schemes) and those based on upper confidence (or credibility) bounds.  ... 
doi:10.1109/icmla.2016.0044 dblp:conf/icmla/Granmo16 fatcat:3ep5f5abnnho7awhdrgcchfjou

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit [article]

Giuseppe Burtini, Jason Loeppky, Ramon Lawrence
2015 arXiv   pre-print
We first explore the traditional stochastic model of a multi-armed bandit, then explore a taxonomic scheme of complications to that model, for each complication relating it to a specific requirement or  ...  Finally, at the end of the paper, we present a table of known upper-bounds of regret for all studied algorithms providing both perspectives for future theoretical work and a decision-making tool for practitioners  ...  Discounted UCB(-T) Discounted UCB and Discounted UCB-Tuned [85, 59] build on the work of UCB1 and UCB-Tuned for the original stochastic bandit problem, modifying the uncertainty padding estimate (the  ... 
arXiv:1510.00757v4 fatcat:eyxqdq3yl5fpdbv53wtnkfa25a

Towards Practical Lipschitz Bandits [article]

Tianyu Wang, Weicheng Ye, Dawei Geng, Cynthia Rudin
2020 arXiv   pre-print
Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains.  ...  In light of our analysis, we design a novel hierarchical Bayesian model for Lipschitz bandit problems.  ...  Acknowledgement The authors are grateful to Aaron J Fisher and Tiancheng Liu for insightful discussions. The authors thank anonymous reviewers for valuable feedback.  ... 
arXiv:1901.09277v6 fatcat:xumjia5bwjct3ovmggyyoqd5xy

Bandit Algorithms for Precision Medicine [article]

Yangyi Lu, Ziping Xu, Ambuj Tewari
2021 arXiv   pre-print
With their roots in the seminal work of Bellman, Robbins, Lai and others, bandit algorithms have come to occupy a central place in modern data science ( Lattimore and Szepesvari, 2020).  ...  The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular  ...  At every round, it estimates the upper confidence bound for the expected reward and the lower confidence bound for the resource consumption for each arm.  ... 
arXiv:2108.04782v1 fatcat:dni5wyzyerestgs3upuzz776n4

Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits

Paul B. Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard
2014 Proceedings of the IEEE  
For the multiarmed bandit problem with transition costs and the multiarmed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively  ...  We address the standard multiarmed bandit problem, the multiarmed bandit problem with transition costs, and the multiarmed bandit problem on graphs.  ...  Swain for their help with implementing the online experiment.  ... 
doi:10.1109/jproc.2014.2307024 fatcat:6xwlrab5ynbu5ag7qnjj544ihq

Nonparametric Stochastic Contextual Bandits [article]

Melody Y. Guan, Heinrich Jiang
2018 arXiv   pre-print
We then give global intrinsic dimension dependent and ambient dimension independent regret bounds.  ...  We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions.  ...  As before we use the UCB strategy in Auer et al. (2002) and fix the confidence level to 0.1. We do not employ any data augmentation.  ... 
arXiv:1801.01750v1 fatcat:wwhqfypcujbprgov5mvqx3zjii

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits [article]

Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard
2019 arXiv   pre-print
For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm,  ...  We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs.  ...  Swain for their help with implementing the online experiment.  ... 
arXiv:1307.6134v5 fatcat:gpucau2sxzb4dj2p3bh3jksloi

Fairness of Exposure in Stochastic Bandits [article]

Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims
2021 arXiv   pre-print
We formulate fairness regret and reward regret in this setting, and present algorithms for both stochastic multi-armed bandits and stochastic linear bandits.  ...  Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed  ...  For the FairX bandit problem, we present a fair upper confidence bound (UCB) algorithm and a fair Thompson sampling (TS) algorithm in the stochastic multi-armed bandits (MAB) setting, as well as a fair  ... 
arXiv:2103.02735v2 fatcat:qk6nkmsoqbaxjmx5gzxihped3m

Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback [article]

Junxiong Wang, Debabrota Basu, Immanuel Trummer
2022 arXiv   pre-print
We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise  ...  Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms.  ...  In the following sections, we incrementally design such BANDIT confidence bounds and derive corresponding error bounds for PCTS.  ... 
arXiv:2110.07232v2 fatcat:aypluotofffe7mjntmwtxigmr4

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe [article]

Quentin Berthet, Vianney Perchet
2017 arXiv   pre-print
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback.  ...  To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization.  ...  The UCB algorithm instructs to pick the action with the smallest lower confidence estimate µ t,i for the loss.  ... 
arXiv:1702.06917v2 fatcat:qrcajzvmyrakhgqk5yepivvdfy

A Multi-armed Bandit Approach to Online Spatial Task Assignment

Umair Ul Hassan, Edward Curry
2014 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops  
We address this challenge by defining a framework for online spatial task assignment based on the multi-armed bandit formalization of the problem.  ...  However, outcome of an assignment is stochastic since the worker can choose to accept or reject the task.  ...  The confidence bound is a function of the worker success rate.  ... 
doi:10.1109/uic-atc-scalcom.2014.68 dblp:conf/uic/HassanC14 fatcat:qzassf3t3jc2xidnvare3lxdli

Feedback based Quality Enhancing Query Suggestion in E- Commerce Environment

2015 International Journal of Science and Research (IJSR)  
Our algorithm is based on "Thompson Sampling" a technique designed for solving multi-arm bandit problems where the best results are not known in advance but instead are tried out to gather feedback.  ...  Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refines their search intent.  ...  In contrast, adding new actions into an UCB implementation, which chooses the action with the highest average plus confidence bound, results in that new action being chosen at every request until the confidence  ... 
doi:10.21275/v4i11.sub158796 fatcat:qtfkdeunfnbzreioifcu67rhxa

Near-optimal inference in adaptive linear regression [article]

Koulik Khamaru, Yash Deshpande, Lester Mackey, Martin J. Wainwright
2021 arXiv   pre-print
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.  ...  We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators.  ...  Figure 2 illustrates the performance of online debiasing with bandit tuning (32) and δ = 0.05. Here we consider a two-armed bandit problem (29) with arm-mean vector 4.1.1.  ... 
arXiv:2107.02266v2 fatcat:zerchyzdvrhxtnlglfdi5mawwe

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit [article]

Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu
2021 arXiv   pre-print
We prove that, for multi-armed bandit, kernel contextual bandit and neuraltangent kernel bandit, ROFU achieves (near-)optimal regret bounds with certainuncertainty measure, which theoretically justifies  ...  OFU has achieved (near-)optimal regret bound for lin-ear/kernel contextual bandits.  ...  For the confidence bound method, it is unclear how to design the frequency for non-linear reward functions.  ... 
arXiv:2106.15128v1 fatcat:33o2ooy7ffd3jlqb5orrqjuqha
« Previous Showing results 1 — 15 out of 886 results