Filters








25 Hits in 3.2 sec

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits [article]

Julian Zimmert, Yevgeny Seldin
2022 arXiv   pre-print
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon.  ...  The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting.  ...  bounds for stochastic bandits with adversarial corruptions in the large C case.  ... 
arXiv:1807.07623v6 fatcat:s7rsxvfqqbea5ixvspeqd4ulcy

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions [article]

Saeed Masoudian, Yevgeny Seldin
2021 arXiv   pre-print
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021).  ...  The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases.  ...  Acknowledgments This project has received funding from European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199.  ... 
arXiv:2103.12487v2 fatcat:nydbeqtd3ra5pnhuuu6nqzb674

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [article]

Chloé Rouyer, Yevgeny Seldin, Nicolò Cesa-Bianchi
2021 arXiv   pre-print
We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played.  ...  Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.  ...  NCB is partially supported by the MIUR PRIN grant Algorithms, Games, and Digital Markets (ALGADIMAR) and by the EU Horizon 2020 ICT-48 research and innovation action number 951847, ELISE (European Learning  ... 
arXiv:2102.09864v1 fatcat:7kxu7mz53vcodcbadtcc6mqawu

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations [article]

Haike Xu, Jian Li
2021 arXiv   pre-print
We also study the setting where we only get access to an approximation oracle for the stochastic combinatorial semi-bandit problem.  ...  We consider the stochastic combinatorial semi-bandit problem with adversarial corruptions.  ...  , Turing AI Institute of Nanjing and Xi'an Institute for Interdisciplinary Information Core Technology.  ... 
arXiv:2106.06712v1 fatcat:ws4jztozy5fslkvickdznuweu4

Better Best of Both Worlds Bounds for Bandits with Switching Costs [article]

Idan Amir, Guy Azov, Tomer Koren, Roi Livni
2022 arXiv   pre-print
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021.  ...  We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of 𝒪(T^2/3) in the oblivious adversarial setting and a bound of 𝒪(min{log (T)/Δ^2,  ...  in Machine Learning, and from an unrestricted gift from Google.  ... 
arXiv:2206.03098v1 fatcat:64kfjkqdbzdq5kxqcwugan44tu

On Optimal Robustness to Adversarial Corruption in Online Decision Problems [article]

Shinji Ito
2021 arXiv   pre-print
We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions.  ...  For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.  ...  Acknowledgments and Disclosure of Funding The author was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan.  ... 
arXiv:2109.10963v1 fatcat:a7yhoncfafcijhwbhq27kvmcxm

Banker Online Mirror Descent [article]

Jiatai Huang, Longbo Huang
2021 arXiv   pre-print
Banker-OMD achieves nearly-optimal performance in all the three settings. In particular, it leads to the first delayed adversarial linear bandit algorithm achieving Õ(poly(n)(√(T) + √(D))) regret.  ...  Banker-OMD allows algorithms to robustly handle delayed feedback, and offers a general methodology for achieving Õ(√(T) + √(D))-style regret bounds in various delayed-feedback online learning tasks, where  ...  algorithms for delayed adversarial MAB (Banker-Tsallis-INF) and delayed linear bandits (Banker-BOLO).  ... 
arXiv:2106.08943v1 fatcat:iy2kbmee2fconkdezogxr7ozta

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [article]

Aadirupa Saha, Pierre Gaillard
2022 arXiv   pre-print
In particular, we give the first best-of-both world result for the dueling bandits regret minimization problem – a unified framework that is guaranteed to perform optimally for both stochastic and adversarial  ...  We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions  ...  Acknowledgment Thanks to Julian Zimmert and Karan Singh for the useful discussions on the existing best-of-both-world multiarmed bandits results.  ... 
arXiv:2202.06694v1 fatcat:hd2j4clntzafzhcdjsndsek3gq

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits [article]

Jiatai Huang, Yan Dai, Longbo Huang
2022 arXiv   pre-print
Specifically, we design an algorithm , when the heavy-tail parameters α and σ are known to the agent, simultaneously achieves the optimal regret for both stochastic and adversarial environments, without  ...  In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where  ...  Acknowledgment This work is supported by the Technology and Innovation Major Project of the Ministry of Science and Technology of China under Grant 2020AAA0108400 and 2020AAA0108403.  ... 
arXiv:2201.11921v2 fatcat:24fbsknv6reuriueuep4cexzfi

The Pareto Frontier of model selection for general Contextual Bandits [article]

Teodor V. Marinov, Julian Zimmert
2021 arXiv   pre-print
It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off  ...  We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term ln(|Π_m|) independent of T is unavoidable for general  ...  We thank Tor Lattimore for pointing us to the technicalities required for bounding the total variation of improper algorithms.  ... 
arXiv:2110.13282v1 fatcat:ozhsrm2rfna4rfrvhffmj44hb4

Locally Differentially Private (Contextual) Bandits Learning [article]

Kai Zheng, Tianle Cai, Weiran Huang, Zhenguo Li, Liwei Wang
2021 arXiv   pre-print
Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization, and obtain the first result for Bandits  ...  Further, we extend our (ε, δ)-LDP algorithm to Generalized Linear Bandits, which enjoys a sub-linear regret Õ(T^3/4/ε) and is conjectured to be nearly optimal.  ...  See Appendix G for more discussions and intuitions. [42] Julian Zimmert and Yevgeny Seldin. "An Optimal Algorithm for Stochastic and Adversarial Bandits".  ... 
arXiv:2006.00701v4 fatcat:quapec7ss5fl7hlzzurpbvin3i

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously [article]

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang, Xiaojin Zhang
2021 arXiv   pre-print
Moreover, by equipping this algorithm with an adversarial component and carefully-designed testings, our second algorithm additionally enjoys minimax-optimal regret in completely adversarial environments  ...  By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not only achieves nearly instance-optimal regret in stochastic environments  ...  Acknowledgements We thank Tor Lattimore and Julian Zimmert for helpful discussions. HL thanks Ilias Diakonikolas and Anastasia Voloshinov for initial discussions in this direction.  ... 
arXiv:2102.05858v3 fatcat:c2knat5jpvheznchwuvp5t2nau

Adapting to Misspecification in Contextual Bandits [article]

Dylan J. Foster and Claudio Gentile and Mehryar Mohri and Julian Zimmert
2021 arXiv   pre-print
Given access to an online oracle for square loss regression, our algorithm attains optimal regret and – in particular – optimal dependence on the misspecification level, with no prior knowledge.  ...  We introduce a new family of oracle-efficient algorithms for ε-misspecified contextual bandits that adapt to unknown model misspecification – both for finite and infinite action settings.  ...  We thank Teodor Marinov and Alexander Rakhlin for discussions on related topics.  ... 
arXiv:2107.05745v1 fatcat:aapvoy6xovh4nd5lizacrwr5ai

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [article]

Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
2022 arXiv   pre-print
The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme.  ...  We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments.  ...  Acknowledgments and Disclosure of Funding CR and YS acknowledge partial support by the Independent Research Fund Denmark, grant number 9040-00361B.  ... 
arXiv:2206.00557v1 fatcat:3gmeoufilfd2hov545f2j5horq

Nonstochastic Bandits with Composite Anonymous Feedback [article]

Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour
2021 arXiv   pre-print
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way.  ...  stability and regret of the original algorithm.  ...  Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. J. Mach. Learn. Res., 22:28–1, 2021. 23  ... 
arXiv:2112.02866v1 fatcat:v4ki7ulxlbg3zilgqlwrtzyway
« Previous Showing results 1 — 15 out of 25 results