Filters








3,993 Hits in 4.0 sec

Bandits with adversarial scaling [article]

Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme
2020 arXiv   pre-print
We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component.  ...  On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling.  ...  Moreover, Thompson Sampling, although ineffective for the cold start attack, has very good performance in the small means attack; this suggests that its analysis could become more tight to scale with the  ... 
arXiv:2003.02287v2 fatcat:gj6nh4w2sjbq5elti6w5mepjte

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays [article]

Jiatai Huang, Yan Dai, Longbo Huang
2022 arXiv   pre-print
We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.  ...  We then present two instances of SFD-INF, each with carefully designed delay-adapted learning scales.  ...  Introduction Multi-Armed Bandit (MAB) [4] is a classical sequential decision game carried out between an agent and an adversary that lasts for T rounds.  ... 
arXiv:2110.13400v2 fatcat:63ddy4uzrfaltg5bzxxk2rabsa

Achieving Privacy in the Adversarial Multi-Armed Bandit

Aristide Tossou, Christos Dimitrakakis
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we improve the previously best known regret bound to achieve ε-differential privacy in oblivious adversarial bandits from O(T2/3 /ε) to O(√T lnT/ε).  ...  This allows us to reach O(√ ln T)-DP, with a a regret of O(T2/3) that holds against an adaptive adversary, an improvement from the best known of O(T3/4).  ...  This research was supported by the SNSF grants "Adaptive control with approximate Bayesian computation and differential privacy" and "Swiss Sense Synergy", by the Marie Curie Actions (REA 608743), the  ... 
doi:10.1609/aaai.v31i1.10896 fatcat:5crfpp5ikzcphkumuapv2zonfm

Achieving Privacy in the Adversarial Multi-Armed Bandit [article]

Aristide C. Y. Tossou, Christos Dimitrakakis
2017 arXiv   pre-print
In this paper, we improve the previously best known regret bound to achieve ϵ-differential privacy in oblivious adversarial bandits from O(T^2/3/ϵ) to O(√(T) T /ϵ).  ...  This allows us to reach O(√( T))-DP, with a regret of O(T^2/3) that holds against an adaptive adversary, an improvement from the best known of O(T^3/4).  ...  This research was supported by the SNSF grants "Adaptive control with approximate Bayesian computation and differential privacy" and "Swiss Sense Synergy", by the Marie Curie Actions (REA 608743), the  ... 
arXiv:1701.04222v1 fatcat:fijr2uopovad3nrazvhqewd4a4

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback [article]

Zongqi Wan, Xiaoming Sun, Jialin Zhang
2022 arXiv   pre-print
We study the adversarial bandit problem with composite anonymous delayed feedback.  ...  However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory.  ...  Then the stochastic process W t equipped with parent function ρ * is called multi-scale random walk.  ... 
arXiv:2204.12764v2 fatcat:uwumzddq2ra7zclmady4wx2goe

Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences

Aadirupa Saha, Pierre Gaillard
2022 International Conference on Machine Learning  
We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decision  ...  In particular, we give the first best-of-both world result for the dueling bandits regret minimization problem-a unified framework that is guaranteed to perform optimally for both stochastic and adversarial  ...  Moreover, Zimmert & Seldin (2021) also provide an upper-bound for stochastic bandits with adversarial corruption.  ... 
dblp:conf/icml/SahaG22 fatcat:l4uhliigxvfcxfor4ubomlpb7u

Near Optimal Adversarial Attack on UCB Bandits [article]

Shiliang Zuo
2020 arXiv   pre-print
We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption.  ...  We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm T - o(T) times with a cumulative cost that scales as √(log T), where T is the number of rounds  ...  Related Work Our adversary model follows closely with [2] . In their work, the authors showed an attack strategy against the UCB algorithm with the cumulative attack cost scaling as O(logT ).  ... 
arXiv:2008.09312v1 fatcat:vmjwkuivzvdxre42ej75efmwta

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [article]

Aadirupa Saha, Pierre Gaillard
2022 arXiv   pre-print
In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more  ...  This resolves the long-standing problem of designing an instancewise gap-dependent order optimal regret algorithm for dueling bandits (with matching lower bounds up to small constant factors).  ...  Acknowledgment Thanks to Julian Zimmert and Karan Singh for the useful discussions on the existing best-of-both-world multiarmed bandits results.  ... 
arXiv:2202.06694v1 fatcat:hd2j4clntzafzhcdjsndsek3gq

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm

Lin Yang, Mohammad Hassan Hajiesmaili, Mohammad Sadegh Talebi, John C. S. Lui, Wing Shing Wong
2020 Neural Information Processing Systems  
This paper studies adversarial bandits with corruptions. In the basic adversarial bandit setting, the reward of arms is predetermined by an adversary who is oblivious to the learner's policy.  ...  Second, we propose , a bandit algorithm that incorporates a biased estimator and a robustness parameter to deal with corruption.  ...  Adversarial Bandits with Corruptions Consider an adversarial bandit problem, where an adversary and an attacker with more powerful ability to manipulate the reward coexist.  ... 
dblp:conf/nips/YangHTLW20 fatcat:xdejbgcqvvb7vm6hejzr2abfpm

Spectrum bandit optimization

Marc Lelarge, Alexandre Proutiere, M. Sadegh Talebi
2013 2013 IEEE Information Theory Workshop (ITW)  
We formulate this problem as a generic linear bandit problem, and analyze it in a stochastic setting where radio conditions are driven by a i.i.d. stochastic process, and in an adversarial setting where  ...  ADVERSARIAL BANDIT PROBLEM In this section, we study the problem in the adversarial setting.  ...  As far as we know, adversarial bandit problems have not been considered to model spectrum allocation issues.  ... 
doi:10.1109/itw.2013.6691221 dblp:conf/itw/LelargePT13 fatcat:jfa5xhu7wrfkdmwk6f7uaxpje4

When Are Linear Stochastic Bandits Attackable? [article]

Huazheng Wang, Haifeng Xu, Hongning Wang
2022 arXiv   pre-print
We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm.  ...  This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits.  ...  Zimmert and Seldin (2021) ; Masoudian and Seldin (2021) proposed best-of-both-world solutions for both stochastic and adversarial bandits which also solved stochastic bandits with adversarial corruption  ... 
arXiv:2110.09008v2 fatcat:umsuhse2tfdjjjph4lqeizk3uq

Advancements in Dueling Bandits

Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem.  ...  As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years.  ...  DTS is the state-of-the-art in the case of small-scale dueling bandits problems, while MergeRUCB is the state-ofthe-art for large-scale dueling bandits algorithms.  ... 
doi:10.24963/ijcai.2018/776 dblp:conf/ijcai/SuiZHY18 fatcat:vfao6bpxt5aifbwyvtk3wg2cu4

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

Ziwei Guan, Kaiyi Ji, Donald J. Bucci Jr., Timothy Y. Hu, Joseph Palombo, Michael Liston, Yingbin Liang
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player.  ...  This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks.  ...  -The adversarial multi-armed bandit model, in which an adversary is allowed to attack in each round with each attack subject to a bounded value.  ... 
doi:10.1609/aaai.v34i04.5821 fatcat:se34f6ghfjgvrk3okzznuz2nnm

Adversarial bandits for drawing generalizable conclusions in non-adversarial experiments: an empirical study

Yang Zhi-Han, Shiyue Zhang, Anna Rafferty, Antonija Mitrovic, Nigel Bosch
2022 Zenodo  
Experimental designs using multi-armed bandit (MAB) algorithms vary the probability of condition assignment for a new student based on prior results, placing more students in more effective conditions.  ...  Instead, we propose using adversarial MAB algorithms, which are less exploitative and thus may exhibit more robustness.  ...  To eliminate scaling issues, rewards are scaled to a fixed range as follows.  ... 
doi:10.5281/zenodo.6853038 fatcat:i5gkjhqmonbbnd5us3wgg53e7a

Corralling a Band of Bandit Algorithms [article]

Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
2017 arXiv   pre-print
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the  ...  Our results are applicable to many settings, such as multi-armed bandits, contextual bandits, and convex bandits. As examples, we present two main applications.  ...  bandit EXP4 O( KT ln |Θ|) 1/2 adversarial contextual bandit SCRiBLe O(d 3 2 √ T ) 1/2 adversarial linear bandit BGD O(d √ LT 3 4 ) 1/4 adversarial convex bandit Thompson Sampling O( T KH(θ * )) 1/2 stochastic  ... 
arXiv:1612.06246v3 fatcat:eighvfwsfzajfozv4sigtwjudq
« Previous Showing results 1 — 15 out of 3,993 results