Filters








2,114 Hits in 3.2 sec

Mixing bandits

Stéphane Caron, Smriti Bhagat
2013 Proceedings of the 7th Workshop on Social Network Mining and Analysis - SNAKDD '13  
We propose two novel strategies leveraging neighborhood estimates to improve the learning rate of bandits for cold-start users.  ...  In this work, we model the learning of preferences of coldstart users using multi-armed bandits [5] embedded in a social network.  ...  Preliminaries A K-armed bandit problem is defined by K distributions P1, . . . , PK , one for each "arm" of the bandit, with respective means p1, . . . , pK .  ... 
doi:10.1145/2501025.2501029 dblp:conf/kdd/CaronB13 fatcat:kfskz2kdqzhg7ao44szgt6mzji

Privacy-Preserving Bandits [article]

Mohammad Malekzadeh, Dimitrios Athanasakis, Hamed Haddadi, Benjamin Livshits
2020 arXiv   pre-print
These results suggest P2B is an effective approach to challenges arising in on-device privacy-preserving personalization.  ...  Contextual bandit algorithms~(CBAs) often rely on personal data to provide recommendations.  ...  In the following we provide some preliminary material on privacy preserving data release mechanisms in relation to personalization with contextual bandit algorithms.  ... 
arXiv:1909.04421v4 fatcat:ynffzmb3czc33dxlkncocevyja

Denumerable-Armed Bandits

Jeffrey S. Banks, Rangarajan K. Sundaram
1992 Econometrica  
Implications of these results are derived for the theories of job search and matching, as well as other applications of the bandit paradigm.  ...  identifying all optimal strategies for finite-armed bandits may be extended to infinite-armed bandits.  ...  Step 0: A Preliminary Result. The following Proposition is an immediate consequence of Proposition IV-3-12 of Neveu (1975) .  ... 
doi:10.2307/2951539 fatcat:c7tkved7b5cs7dda7cjkrhj43m

Bandits with an Edge [article]

Dotan Di Castro, Claudio Gentile, Shie Mannor
2011 arXiv   pre-print
We consider a bandit problem over a graph where the rewards are not directly observed.  ...  Model and Preliminaries In this section we describe the classical Multi-Armed Bandit (MAB) setup, describe the Graphical Bandit (GB) setup, state/recall two concentration bounds for sequences of random  ...  Therefore, the resulting algorithms and analyses are quite different.  ... 
arXiv:1109.2296v1 fatcat:rqaxdtmarzgvfpux2w3ze7yzdi

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits [article]

Yogev Bar-On, Yishay Mansour
2019 arXiv   pre-print
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem.  ...  Bistritz and Leshem [2018] Preliminaries We consider a nonstochastic multi-armed bandit problem over a finite action set A = {1, . . . , K} played by N agents.  ...  Center-based cooperative multi-armed bandits We now present the center-based policy for the cooperative multi-armed bandit setting, which will give us the desired low individual regret.  ... 
arXiv:1907.03346v3 fatcat:7p5jynnkcrfkrpiy2wnxqrj6si

Introduction to Multi-Armed Bandits [article]

Aleksandrs Slivkins
2021 arXiv   pre-print
The first four chapters are on IID rewards, from the basic model to impossibility results to Bayesian priors to Lipschitz rewards.  ...  The chapters on "bandits with similarity information", "bandits with knapsacks" and "bandits and agents" can also be consumed as standalone surveys on the respective topics.  ...  We are also interested in comparing BIC bandit algorithms with optimal bandit algorithms. Preliminaries.  ... 
arXiv:1904.07272v6 fatcat:rwq7spo67nfrhfl6jwa3uqcd7q

Federated Linear Contextual Bandits [article]

Ruiquan Huang, Weiqiang Wu, Jing Yang, Cong Shen
2021 arXiv   pre-print
This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits coupled through common global parameters.  ...  WW's work was done before he joined Facebook.  ...  These results demonstrate the effectiveness of collaborative learning in the federated bandits setting.  ... 
arXiv:2110.14177v1 fatcat:y322gkg7cvfcrnqckr5lfrdljm

Adversarial Attacks on Stochastic Bandits [article]

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu
2018 arXiv   pre-print
The result means the attacker can easily hijack the behavior of the bandit algorithm to promote or obstruct certain actions, say, a particular medical treatment.  ...  As bandits are seeing increasingly wide use in practice, our study exposes a significant security threat.  ...  Our main result is the following general upper bound on the cumulative attack cost. Theorem 1. Let δ ≤ 1/2.  ... 
arXiv:1810.12188v1 fatcat:kc5dt6pvb5g65aidbxw7botyze

Graphical Models for Bandit Problems [article]

Kareem Amin, Michael Kearns, Umar Syed
2012 arXiv   pre-print
We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any  ...  Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action  ...  In other words, we join i and i ′ by an edge if and only if there is some potential function f P that depends jointly on the variables i and i ′ .  ... 
arXiv:1202.3782v1 fatcat:l5vdorprcvdo3c3sagx4yftsaq

Better Algorithms for Benign Bandits [chapter]

Elad Hazan, Satyen Kale
2009 Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms  
Our algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling. Keywords: multi-armed bandit, regret minimization, online learning * .  ...  The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information.  ...  A preliminary version of this result was presented in Hazan and Kale (2009a).  ... 
doi:10.1137/1.9781611973068.5 fatcat:kij2svx5gfhgrgwth2dxd4ki4m

Strategic Experimentation with Exponential Bandits

Martin Cripps, Godfrey Keller, Sven Rady
2003 Social Science Research Network  
This paper studies a game of strategic experimentation with two-armed bandits whose risky arm might yield a payoff only after some exponentially distributed random time.  ...  First, all our results apply to bandit problems where the known arm generates a stationary non-deterministic stream of payoffs -we can simply reinterpret s as the expected flow payoff.  ...  The intuition for this result is simple.  ... 
doi:10.2139/ssrn.372200 fatcat:t6fvqtm6lzekfchm4hwz5hqvaa

Strategic Experimentation with Exponential Bandits

Godfrey Keller, Sven Rady, Martin Cripps
2005 Econometrica  
This paper studies a game of strategic experimentation with two-armed bandits whose risky arm might yield a payoff only after some exponentially distributed random time.  ...  First, all our results apply to bandit problems where the known arm generates a stationary non-deterministic stream of payoffs -we can simply reinterpret s as the expected flow payoff.  ...  The intuition for this result is simple.  ... 
doi:10.1111/j.1468-0262.2005.00564.x fatcat:jsyhrblwbbc5jaxgdkslh4jq4u

Best-Arm Identification in Linear Bandits [article]

Marta Soare, Alessandro Lazaric, Rémi Munos
2014 arXiv   pre-print
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward  ...  The results.  ...  The multi-armed bandit case.  ... 
arXiv:1409.6110v2 fatcat:u65bel3bmnfmplkn7fj5vnuuyy

Recurrent Submodular Welfare and Matroid Blocking Bandits [article]

Orestis Papadigenopoulos, Constantine Caramanis
2021 arXiv   pre-print
A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, yields a $(1-\frac{1}{e})$-approximation for partition matroids, yet it only guarantees a $\frac  ...  A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions  ...  G d G d−1 For the graph G d = (V d , E d ), we have V d = V d−1 ∪ {u d } and E d = E d−1 ∪ {{u, u d }, ∀u ∈ V d−1 } (namely, G d is essentially the result of the join operation between G d−1 and a single  ... 
arXiv:2102.00321v3 fatcat:rbk4uwywffbb3ooar5ry4z7y3i

Best Arm Identification in Graphical Bilinear Bandits [article]

Geovani Rizk and Albert Thomas and Igor Colin and Rida Laraki and Yann Chevaleyre
2021 arXiv   pre-print
By efficiently exploiting the geometry of this bandit problem, we propose a \emph{decentralized} allocation strategy based on random sampling with theoretical guarantees.  ...  We introduce a new graphical bilinear bandit problem where a learner (or a \emph{central entity}) allocates arms to the nodes of a graph and observes for each edge a noisy bilinear reward representing  ...  Hence, all the previous results hold for this more general graphical bilinear bandit problem, provided any dependence in d is modified to d + 1. F.  ... 
arXiv:2012.07641v3 fatcat:yocskml56zgk7jig7i7wwmqawm
« Previous Showing results 1 — 15 out of 2,114 results