2,123 Hits in 5.5 sec

Batched Multi-armed Bandits Problem [article]

Zijun Gao, Yanjun Han, Zhimei Ren, Zhengqing Zhou
2019 arXiv   pre-print
In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches.  ...  In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even  ...  In this paper we study the influence of round constraints on the learning performance via the following batched multi-armed bandit problem.  ... 
arXiv:1904.01763v3 fatcat:lqswsp5lvjgaxix3j3kbpvto4m

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem [article]

Nadav Merlis, Shie Mannor
2020 arXiv   pre-print
We consider the combinatorial multi-armed bandit (CMAB) problem, where the reward function is nonlinear.  ...  In this setting, the agent chooses a batch of arms on each round and receives feedback from each arm of the batch.  ...  Introduction The multi-armed bandit (MAB) problem is one of the most elementary problems in decision making under uncertainty.  ... 
arXiv:1905.03125v4 fatcat:dqwkyiztjje6bgdcz7sp3rlrga

Multi-Armed Bandit Problem and Batch UCB Rule [article]

Alexander Kolnogorov, Sergey Garbar
2019 arXiv   pre-print
We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes.  ...  Bather for the multi-armed bandit problem and using UCB rule, i.e. choosing the action corresponding to the maximum of the upper bound of the confidence interval of the current estimate of the expected  ...  We consider the multi-armed bandit problem.  ... 
arXiv:1902.00214v1 fatcat:4h6xo5hg2fd3ja7unl5itpxxey

Regret Bounds for Batched Bandits [article]

Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab Mirrokni
2020 arXiv   pre-print
We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems.  ...  We also study the batched adversarial multi-armed bandit problem for the first time and find the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.  ...  Contributions and Paper Outline We provide analytic regret bounds for the batched version of three bandit problems: stochastic multi-armed bandits, stochastic linear bandits, and adversarial multi-armed  ... 
arXiv:1910.04959v2 fatcat:j2xfhrqna5b33egatvmq5uqlx4

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits [article]

Sergey Garbar
2022 arXiv   pre-print
We consider the upper confidence bound strategy for Gaussian multi-armed bandits with known control horizon sizes N and build its limiting description with a system of stochastic differential equations  ...  Rewards for the arms are assumed to have unknown expected values and known variances.  ...  Introduction We consider a multi-armed bandit (MAB) problem. MAB can be viewed as a slot machine with J arms. Each one of the arms can be selected for play, which yields some random income (reward).  ... 
arXiv:2112.06423v2 fatcat:qpn3zy7jovffxhvb2w7wb2gszm

Contextual Bandit with Adaptive Feature Extraction [article]

Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Irina Rish
2020 arXiv   pre-print
We consider an online decision making setting known as contextual bandit problem, and propose an approach for improving contextual bandit performance by using an adaptive feature extraction (representation  ...  both the bandit and the encoding function based on the context and on the feedback (reward).  ...  RELATED WORK The multi-armed bandit problem has been extensively studied.  ... 
arXiv:1802.00981v4 fatcat:7z7r7xgnlvebxcspatchyttzc4

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward [article]

Baihan Lin
2020 arXiv   pre-print
Lastly, we introduced a relevant real-life example where this problem setting is especially useful.  ...  experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit  ...  This framework is usually formulated as the Multi-Armed Bandits (MAB) problem where each arm of the bandit corresponds to an unknown (but usually fixed) reward probability distribution [1, 2] , and the  ... 
arXiv:2009.08457v2 fatcat:vhvcravl2fg7dmyjoptfwsrnmu

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback [article]

Zongqi Wan, Xiaoming Sun, Jialin Zhang
2022 arXiv   pre-print
Especially, for K-armed bandit and bandit convex optimization, we have 𝒪(T^2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit.  ...  We study the adversarial bandit problem with composite anonymous delayed feedback.  ...  Introduction Multi-armed bandit is a widely studied problem. It can be formulated by a multi-rounds game between two players, an adversary and a learner.  ... 
arXiv:2204.12764v2 fatcat:uwumzddq2ra7zclmady4wx2goe

Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits [article]

Julia Kreutzer, David Vilar, Artem Sokolov
2021 arXiv   pre-print
A multi-armed bandit is trained to dynamically choose between facets in a way that is most beneficial for the MT system.  ...  We find that bandit learning leads to competitive MT systems across tasks, and our analysis provides insights into its learned strategies and the underlying data sets.  ...  We formulate multi-faceted training as a multi-armed bandit learning problem, where the arms/actions correspond to the available facets in the training data.  ... 
arXiv:2110.06997v1 fatcat:bwthq4twvbdznnloshegk7r4ue

The Impact of Batch Learning in Stochastic Linear Bandits [article]

Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
2022 arXiv   pre-print
Also, we provide a more robust result for the 2-armed bandit problem as an important insight.  ...  We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period.  ...  We consider the stochastic multi-armed bandit problem and examine the UCB family of algorithms. Lemma A.3 (Lemma 1.2, (Agrawal et al., 2016) ).  ... 
arXiv:2202.06657v1 fatcat:6a732dpmuja6vnwlaoyieb7qbu

MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation [article]

Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi
2020 arXiv   pre-print
It is captured by the K-armed dueling bandit problem, which is a variant of the K-armed bandit problem, where the feedback comes in the form of pairwise preferences.  ...  Today's deployed search systems can evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers is a critical aspect of K-armed dueling bandit problems.  ...  PROBLEM SETTING In this section, we first describe in more precise terms the K-armed dueling bandit problem, which is a variation of the multi-armed bandit (MAB) problem.  ... 
arXiv:1812.04412v2 fatcat:jgtm6ukpknh3ppmsjyhmfkvh7u

Hospital Admission Location Prediction via Deep Interpretable Networks for the Year-round Improvement of Emergency Patient Care

Rasheed El-Bouri, David Eyre, Peter J. Watkinson, Tingting Zhu, David Clifton
2020 IEEE journal of biomedical and health informatics  
A novel deep learning training strategy was created that combines learning via curriculum and a multi-armed bandit to exploit this curriculum post-initial training.  ...  The problem is posed as a multi-class classification into seven separate ward types.  ...  Algorithm 1 shows how the multi-armed bandit problem was applied for training.  ... 
doi:10.1109/jbhi.2020.2990309 pmid:32750898 fatcat:zr34cv3znjemvocnmtzrgqs6ka

Accelerated learning from recommender systems using multi-armed bandit [article]

Meisam Hejazinia, Kyler Eastman, Shuqin Ye, Abbas Amirabadi, Ravi Divvela
2019 arXiv   pre-print
We argue that multi armed bandit (MAB) testing as a solution to these issues.  ...  MULTI-ARMED BANDIT ARCHITECTURE AND PROCESS Our daily mini-batch MAB training pipeline consists of three main processes: reward attribution, traffic proportion mini-batch process, and a randomized online  ...  Empirical Evaluation of Recommendation Algorithms The variant of multi-armed bandit that is popular in industry is contextual bandit, for its ability to handle cold start problem at scale [58] .  ... 
arXiv:1908.06158v1 fatcat:7rp3l5ea25feliymdm6cyeuska

Advancements in Dueling Bandits

Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem.  ...  The dueling bandits problem is an online learning framework where learning happens "on-the-fly" through preference feedback, i.e., from comparisons between a pair of actions.  ...  For this setting, the Self-Sparring algorithm algorithmically reduces the multi-dueling bandits problem into a conventional multi-armed bandit problem that can be solved using a stochastic bandit algorithm  ... 
doi:10.24963/ijcai.2018/776 dblp:conf/ijcai/SuiZHY18 fatcat:vfao6bpxt5aifbwyvtk3wg2cu4

Batched Dueling Bandits [article]

Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan
2022 arXiv   pre-print
The K-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied.  ...  We study the batched K-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality.  ...  The proof is similar to the lower bound proof in [19] for batched multi-armed bandits.  ... 
arXiv:2202.10660v1 fatcat:nxmboex2grd5nnrjrumoybzf74
« Previous Showing results 1 — 15 out of 2,123 results