37 Hits in 4.6 sec

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without [article]

Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke
2019 arXiv   pre-print
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem.  ...  We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely T^1-1/2m where m is the number of players.  ...  ,K} i−1 m , Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without Sébastien Bubeck Microsoft Research Yuanzhi Li * By making even stronger  ... 
arXiv:1904.12233v2 fatcat:pv5u3qfa7jcp7elowk7w52rnya

On No-Sensing Adversarial Multi-player Multi-armed Bandits with Collision Communications [article]

Chengshuai Shi, Cong Shen
2021 arXiv   pre-print
We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective.  ...  without collision information in an adversarial environment.  ...  In MP-MAB, multiple players simultaneously play the bandit game without explicit communications and interact with each other only through arm collisions.  ... 
arXiv:2011.01090v2 fatcat:s3anqbrcubhuvjm2lk7mxs44ay

Federated Multi-Armed Bandits [article]

Chengshuai Shi, Cong Shen
2021 arXiv   pre-print
Federated multi-armed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning.  ...  We show that, somewhat surprisingly, the order-optimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity.  ...  Nonstochastic multi-player multi-armed bandits: Optimal rate with collision information, sublinear without. In Conference on Learning Theory, 961-987. PMLR.  ... 
arXiv:2101.12204v2 fatcat:x56x4dej6rfwtjjagr6yqyc334

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players [article]

Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet
2020 arXiv   pre-print
We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero  ...  We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm  ...  Multiplayer bandits without observing collision information. arXiv preprint arXiv:1808.08416. Magesh, A. and Veeravalli, V. V. (2019).  ... 
arXiv:1902.01239v4 fatcat:ohiwofaatfeynell7vwjptnjyu

Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions [article]

Chengshuai Shi, Cong Shen
2021 arXiv   pre-print
We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm.  ...  Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound.  ...  We further make an assumption 9 that the optimal arm-player assignment always have at most one player on each of the top-M arms without collision.  ... 
arXiv:2106.13669v1 fatcat:mtgir2yyo5e5lcki3wd3czxucm

Multi-Player Bandits: The Adversarial Case [article]

Pragnya Alatur, Kfir Y. Levy, Andreas Krause
2019 arXiv   pre-print
In this work, we design the first Multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary.  ...  We consider a setting where multiple players sequentially choose among a common set of actions (arms).  ...  Despite impressive progress on Multi-player Bandit problems, existing works only address the stochastic setting where the environment is stationary.  ... 
arXiv:1902.08036v1 fatcat:jmcbohkg55gdnasuejiz2umvlu

Decentralized Learning for Channel Allocation in IoT Networks over Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game [article]

Wenbo Wang, Amir Leshem, Dusit Niyato, Zhu Han
2021 arXiv   pre-print
Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error.  ...  They also have to reach an efficient, collision-free solution of channel allocation with limited coordination.  ...  (MP) Multi-Armed Bandits (MAB).  ... 
arXiv:2003.13314v3 fatcat:qygpagrcffeu5cbb7mqc7tgy6u

Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [article]

Haoyang Liu, Keqin Liu, Qing Zhao
2011 arXiv   pre-print
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses M out of N arms to play at each time.  ...  We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange.  ...  Restless Multi-Armed Bandit with Unknown Dynamics In this paper, we consider Restless Multi-Armed Bandit (RMAB), a generalization of the classic MAB.  ... 
arXiv:1011.4969v2 fatcat:vcwyqj7tbfduzdstpxm7luznia

Bandit Learning in Decentralized Matching Markets [article]

Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan
2021 arXiv   pre-print
Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition.  ...  We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences  ...  Problem Setting We consider a multiplayer multi-armed bandit problem with N players and L stochastic arms, with N ≤ L.  ... 
arXiv:2012.07348v4 fatcat:r6p7qsxzlbfxpo3c3ondf62ec4

Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [article]

Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri
2021 arXiv   pre-print
We formulate this problem as the ϵ-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players  ...  In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information  ...  In contrast, we study the multi-player setting, where all players learn continually and concurrently. Collisions in multi-player bandits.  ... 
arXiv:2010.15390v2 fatcat:obhmf2l7zvcjbns6ailmpbwt7u

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits [article]

Naumaan Nayyar, Dileep Kalathil, Rahul Jain
2016 arXiv   pre-print
In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution.  ...  We consider the problem of learning in single-player and multiplayer multiarmed bandit models.  ...  This motivates the non-Bayesian setting. The and multiplayer multi-armed bandit models.  ... 
arXiv:1505.00553v2 fatcat:7zbiwjvn6nhi3jsjqsfhma75gm

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems [article]

Cem Tekin, Mingyan Liu
2015 arXiv   pre-print
We propose a learning algorithm with logarithmic regret uniformly over time with respect to the optimal finite horizon policy. Our results extend the optimal adaptive learning of MDPs to POMDPs.  ...  In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems.  ...  RELATED WORK Related work on multi-armed bandit problems started with the seminal paper by Lai and Robbins [3] , where asymptotically optimal adaptive policies for arms with iid reward processes (referred  ... 
arXiv:1107.4042v3 fatcat:fxczzt4sl5aapfaeynsuatzp24

Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits [article]

Francesc Wilhelmi, Cristina Cano, Gergely Neu, Boris Bellalta, Anders Jonsson, Sergio Barrachina-Muñoz
2018 arXiv   pre-print
We rely on Reinforcement Learning (RL), and more specifically on Multi-Armed Bandits (MABs), to allow networks to learn their best configuration.  ...  Our results show that optimal proportional fairness can be achieved, even when no information about neighboring networks is available to the learners and Wireless Networks (WNs) operate selfishly.  ...  and the non-stochastic (or adversarial ) bandit problem where the rewards are chosen arbitrarily by the environment.  ... 
arXiv:1710.11403v3 fatcat:b6cpppgdnze5vf7zkzpvc4glfu

Introduction to Multi-Armed Bandits [article]

Aleksandrs Slivkins
2022 arXiv   pre-print
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty.  ...  The chapters on "bandits with similarity information", "bandits with knapsacks" and "bandits and agents" can also be consumed as standalone surveys on the respective topics.  ...  Chapter 4 Bandits with Similarity Information We consider stochastic bandit problems in which an algorithm has auxiliary information on similarity between arms.  ... 
arXiv:1904.07272v7 fatcat:pptyhyyshrdyhhf7bdonz5dsv4

Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics

Haoyang Liu, Keqin Liu, Qing Zhao
2013 IEEE Transactions on Information Theory  
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses one out of N arms to play at each time.  ...  We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange.  ...  Multi-Armed Bandit with i.i.d. and Rested Markovian Reward ModelsIn the classic multi-armed bandit (MAB) with an i.i.d. reward model, there are N independent arms and a single player.  ... 
doi:10.1109/tit.2012.2230215 fatcat:rz73z6gqsjhalkco5mk5lmi5ny
« Previous Showing results 1 — 15 out of 37 results