Filters








1,634 Hits in 7.2 sec

Regret Bounds for Batched Bandits [article]

Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab Mirrokni
2020 arXiv   pre-print
We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batches.  ...  We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems.  ...  There is a non-adaptive algorithm for batched adversarial multi-armed bandits with regret bounded by E[Regret] ≤ O K + T B T log(K) . Proof.  ... 
arXiv:1910.04959v2 fatcat:j2xfhrqna5b33egatvmq5uqlx4

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem [article]

Nadav Merlis, Shie Mannor
2020 arXiv   pre-print
This, in turn, leads to much tighter regret bounds when the smoothness parameter is batch-size independent.  ...  We also prove matching lower bounds for the PMC problem and show that our algorithm is tight, up to a logarithmic factor in the problem's parameters.  ...  Acknowledgments The authors thank Asaf Cassel and Esther Derman for their helpful comments on the manuscript.  ... 
arXiv:1905.03125v4 fatcat:dqwkyiztjje6bgdcz7sp3rlrga

Batched Dueling Bandits [article]

Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan
2022 arXiv   pre-print
For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret.  ...  Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches.  ...  The proof is similar to the lower bound proof in [19] for batched multi-armed bandits.  ... 
arXiv:2202.10660v1 fatcat:nxmboex2grd5nnrjrumoybzf74

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret [article]

Raman Arora, Ambuj Tewari
2012 arXiv   pre-print
On the other hand, if the adversary's memory is bounded, we present a general technique that converts any bandit algorithm with a sublinear regret bound into an algorithm with a sublinear policy regret  ...  bound.  ...  We use this technique to derive a policy-regret bound of O(T 2/3 ) for the karmed bandit problem, O(T 4/5 ) for bandit convex optimization, O(T 3/4 ) for bandit linear optimization (or O(T 2/3 ) if the  ... 
arXiv:1206.6400v1 fatcat:4iy55rymujavhcsse5zhje3dce

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback [article]

Zongqi Wan, Xiaoming Sun, Jialin Zhang
2022 arXiv   pre-print
Especially, for K-armed bandit and bandit convex optimization, we have 𝒪(T^2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit.  ...  However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory.  ...  They proved o(T ) policy regret bounds for many bandit problems with bounded memory assumption. Especially, they proved O(T 2/3 ) policy regret bounds for K-armed bandit.  ... 
arXiv:2204.12764v2 fatcat:uwumzddq2ra7zclmady4wx2goe

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Tianyuan Jin, Jing Tang, Pan Xu, Keke Huang, Xiaokui Xiao, Quanquan Gu
2021 International Conference on Machine Learning  
Moreover, we prove that for any constant c > 0, no algorithm can achieve the asymptotically optimal regret within c log log T batches.  ...  In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assume that the time horizon T is known in advance.  ...  Acknowledgement We thank the anonymous reviewers for their helpful comments. X. Xiao is supported by the Ministry of Education, Singapore, under Tier-2 Grant R-252-000-A70-112. T.  ... 
dblp:conf/icml/JinT0HXG21 fatcat:x6eyr2ck4zcrbal6mxg45rxvdu

Bridging Adversarial and Nonstationary Multi-armed Bandit [article]

Ningyuan Chen, Shuoguang Yang
2022 arXiv   pre-print
We provide algorithms that attain the optimal regret with the matching lower bound.  ...  In the multi-armed bandit framework, there are two formulations that are commonly employed to handle time-varying reward distributions: adversarial bandit and nonstationary bandit.  ...  2 In this section, we provide an upper bound for the regret of Algorithm 2.  ... 
arXiv:2201.01628v2 fatcat:cmszq2xyjrbhlkfj5nba7qgpjm

MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation [article]

Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi
2020 arXiv   pre-print
The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search.  ...  Our main finding is that for large-scale Condorcet ranker evaluation problems, MergeDTS outperforms the state-of-the-art dueling bandit algorithms.  ...  We also thank our editor and the anonymous reviewers for extensive comments and suggestions that helped us to improve the paper.  ... 
arXiv:1812.04412v2 fatcat:jgtm6ukpknh3ppmsjyhmfkvh7u

Differentially Private Stochastic Linear Bandits: (Almost) for Free [article]

Osama A. Hanna, Antonious M. Girgis, Christina Fragouli, Suhas Diggavi
2022 arXiv   pre-print
In particular, we achieve a regret of Õ(√(T)+1/ϵ) matching the known lower bound for private linear bandits, while the best previously known algorithm achieves Õ(1/ϵ√(T)).  ...  In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models.  ...  A.2 Regret Analysis We next prove the regret bound of Algorithm 1 for stochastic linear bandits.  ... 
arXiv:2207.03445v1 fatcat:zb6id4lfdjewjocsxoqd7yhu6y

Batched Multi-armed Bandits Problem [article]

Zijun Gao, Yanjun Han, Zhimei Ren, Zhengqing Zhou
2019 arXiv   pre-print
In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even  ...  While the minimax regret for the two-armed stochastic bandits has been completely characterized in , the effect of the number of arms on the regret for the multi-armed case is still open.  ...  Lower Bound This section presents lower bounds for the batched multi-armed bandit problem, where in Section 3.1 we design a fixed multiple hypothesis testing problem to show the lower bound for any policies  ... 
arXiv:1904.01763v3 fatcat:lqswsp5lvjgaxix3j3kbpvto4m

Adversarial Blocking Bandits

Nick Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh
2020 Neural Information Processing Systems  
In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation.  ...  We also show that the regret upper bound of RGA is tight if the blocking durations are bounded above by an order of O(1).  ...  We also discuss a potential lower bound for the α-regret of the adversarial blocking bandit problem in the case of B T ∈ o(KT ) and D ∈ O(1).  ... 
dblp:conf/nips/BishopCMT20 fatcat:ivli5awimjcm7nsmlcxq3gebea

Sequential Batch Learning in Finite-Action Linear Contextual Bandits [article]

Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W. Glynn, Yinyu Ye
2020 arXiv   pre-print
In each setting, we establish a regret lower bound and provide an algorithm, whose regret upper bound nearly matches the lower bound.  ...  of batches and can only observe outcomes for the individuals within a batch at the batch's end.  ...  Problem-Dependent Regret Bounds The regret bounds given in the previous two sections are problem-independent regret bounds (also known as gap-independent regret bounds in the bandits literature): they  ... 
arXiv:2004.06321v1 fatcat:2ilulxsb55aizaqvaipmgtjwe4

Lipschitz Bandits with Batched Feedback [article]

Yasong Feng, Zengfeng Huang, Tianyu Wang
2022 arXiv   pre-print
We also provide complexity analysis for this problem. Our theoretical lower bound implies that Ω(loglog T) batches are necessary for any algorithm to achieve the optimal regret.  ...  In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches.  ...  Lower Bounds In this section, we present lower bounds for Lipschitz bandits with batched feedback, which in turn gives communication lower bounds for all Lipschitz bandit algorithms.  ... 
arXiv:2110.09722v4 fatcat:gyrpqrwerrblja52ozedcf5pj4

The Impact of Batch Learning in Stochastic Linear Bandits [article]

Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
2022 arXiv   pre-print
That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy.  ...  We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period.  ...  the batch size for a given policy in order to achieve the rate-optimal regret bounds.  ... 
arXiv:2202.06657v1 fatcat:6a732dpmuja6vnwlaoyieb7qbu

Batched bandit problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg
2016 Annals of Statistics  
We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds.  ...  Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of  ...  While optimal regret bounds are well understood for standard multi-armed bandit problems when M = T , a systematic analysis of the batched case does not exist.  ... 
doi:10.1214/15-aos1381 fatcat:7miogv4gsndkbando6db6jetpi
« Previous Showing results 1 — 15 out of 1,634 results