30 Hits in 4.8 sec

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions [article]

Sébastien Bubeck, Thomas Budzinski, Mark Sellke
2020 arXiv   pre-print
We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness.  ...  In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms.  ...  Introduction We consider the cooperative multi-player version of the classical stochastic multi-armed bandit problem. We denote by m the number of players and by K ≥ m the number of arms.  ... 
arXiv:2011.03896v1 fatcat:7qegn35hxjcezastnbon43gj7e

Selfish Robustness and Equilibria in Multi-Player Bandits [article]

Etienne Boursier, Vianney Perchet
2020 arXiv   pre-print
Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently.  ...  In this class of problems, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.  ...  , operations research and their interactions with data sciences.  ... 
arXiv:2002.01197v2 fatcat:yaup2oxkgfd5zd3gzswltxxa7m

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without [article]

Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke
2019 arXiv   pre-print
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem.  ...  The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss.  ...  Acknowledgement This work was supported in part by an NSF graduate fellowship and a Stanford graduate fellowship.  ... 
arXiv:1904.12233v2 fatcat:pv5u3qfa7jcp7elowk7w52rnya

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization [article]

Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang
2021 arXiv   pre-print
Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous  ...  BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration.  ...  ., 2010) , the multi-player version of the multi-armed bandits problem (MP-MAB) has sparked significant interests in recent years.  ... 
arXiv:2110.14622v2 fatcat:zomzl6n3sffpfhwona65j7igna

Bandit Learning in Decentralized Matching Markets [article]

Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan
2021 arXiv   pre-print
Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition.  ...  We introduce a new algorithm for this setting that, over a time horizon T, attains 𝒪(log(T)) stable regret when preferences of the arms over players are shared, and 𝒪(log(T)^2) regret when there are  ...  Problem Setting We consider a multiplayer multi-armed bandit problem with N players and L stochastic arms, with N ≤ L.  ... 
arXiv:2012.07348v4 fatcat:r6p7qsxzlbfxpo3c3ondf62ec4

Channel Selection for Network-Assisted D2D Communication via No-Regret Bandit Learning With Calibrated Forecasting

Setareh Maghsudi, Slawomir Stanczak
2015 IEEE Transactions on Wireless Communications  
This scenario is modeled as a multi-player multi-armed bandit game with side information, for which a distributed algorithmic solution is proposed.  ...  The solution is a combination of no-regret learning and calibrated forecasting, and can be applied to a broad class of multi-player stochastic learning problems, in addition to the formulated channel selection  ...  Single-Player and Multi-Player Multi-Armed Bandit Single-player multi-armed bandit game (SP-MAB, hereafter) is a class of sequential decision making problems with limited information.  ... 
doi:10.1109/twc.2014.2365803 fatcat:lyqpadjarnhlzm5m36vnlqfhvy

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [article]

Aldo Pacchiano, Peter Bartlett, Michael I. Jordan
2021 arXiv   pre-print
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem.  ...  Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent  ...  Introduction We consider the cooperative Multi-Player version of the Multi-Armed bandit problem.  ... 
arXiv:2111.04873v1 fatcat:6wyumnbq5nctbobrrzqay4rj6u

Stochastic Multi-Player Multi-Armed Bandits with Multiple Plays for Uncoordinated Spectrum Access

Marie-Josepha Youssef, Venugopal V. Veeravalli, Joumana Farah, Charbel Abdel Nour
2020 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications  
Index Terms-uncoordinated spectrum access, multi-armed bandits with multiple plays, varying reward distribution.  ...  In this paper, an algorithm based on the multiplayer multi-armed bandit (MAB) framework is proposed to solve an uncoordinated spectrum access problem.  ...  Akshayaa Magesh (UIUC) for her help with the subject of MABs and for useful discussions.  ... 
doi:10.1109/pimrc48278.2020.9217349 dblp:conf/pimrc/YoussefVFN20 fatcat:c2hngpe2mrerzh5qunni64rvyi

Resource Allocation in NOMA-based Self-Organizing Networks using Stochastic Multi-Armed Bandits [article]

Marie Josepha Youssef, Venugopal V. Veeravalli, Joumana Farah, Charbel Abdel Nour, Catherine Douillard
2021 arXiv   pre-print
Based on the multi-player multi-armed bandit (MAB) framework, the proposed technique does not require any communication or coordination between the APs.  ...  This results in an MAB model with varying channel rewards, multiple plays and non-zero reward on collision.  ...  The related framework of multi-player multi-armed bandits (MAB) [11] has also been widely used to study multiple problems in wireless communication systems ranging from SON [12] - [14] , to uncoordinated  ... 
arXiv:2101.06340v1 fatcat:zqbce7d2qvg6xjsmd3rcjduzxa

Federated Multi-Armed Bandits Under Byzantine Attacks [article]

Ilker Demirel, Yigit Yildirim, Cem Tekin
2022 arXiv   pre-print
Federated multi-armed bandits (FMAB) is a recently emerging framework where a cohort of learners with heterogeneous local models play a MAB game and communicate their aggregated feedback to a parameter  ...  We analyze the interplay between the algorithm parameters, unavoidable error margin, regret, communication cost, and the arms' suboptimality gaps.  ...  Neither the server nor any honest client is aware of whether a client is Byzantine.  ... 
arXiv:2205.04134v1 fatcat:zcleooiygrbpri7l67uhw5avxm

Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit

Ehab Mahmoud Mohamed, Sherief Hashima, Abdallah Aldosary, Kohei Hatano, Mahmoud Ahmed Abdelghany
2020 Sensors  
A tool of machine learning (ML) is exploited to address the problem as a budget-constrained multi-player multi-armed bandit (MAB) problem.  ...  In this decentralized setting, where information is neither prior available nor exchanged among UAVs, a selfish and concurrent multi-player MAB strategy is suggested.  ...  General multi-player multi-armed bandit (MAB) protocol. Figure 3 . 3 Figure 3. General multi-player multi-armed bandit (MAB) protocol.  ... 
doi:10.3390/s20143947 pmid:32708559 fatcat:3wjwvpl6kzajxazkz46j7frv3q

Distributed Online Learning for Coexistence in Cognitive Radar Networks [article]

William Howard, Anthony Martone, R. Michael Buehrer
2022 arXiv   pre-print
For this task we specifically select the multi-player multi-armed bandit (MMAB) model, which poses the problem as a sequential game, where each radar node in a network makes independent selections of center  ...  Specifically, we model a network of cooperative, independent, and non-communicating radar nodes which must share resources within the network as well as with non-cooperative nearby emitters.  ...  The use of multi-player multi-armed bandit models is discussed to address this problem.  ... 
arXiv:2203.02327v2 fatcat:lfiz4asobnaklakfc3mj2m6spi

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [article]

Kaiqing Zhang, Zhuoran Yang, Tamer Başar
2021 arXiv   pre-print
they address, i.e., fully cooperative, fully competitive, and a mix of the two.  ...  ., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively  ...  The most common tree policy is to apply the UCB1 (UCB stands for upper confidence bound) algorithm, which was originally devised for stochastic multi-arm bandit problems [54, 55] , to each node of the  ... 
arXiv:1911.10635v2 fatcat:ihlhtjlhnrdizbkcfzsnz5urfq

Decentralized Learning in Online Queuing Systems [article]

Flore Sentenac and Etienne Boursier and Vianney Perchet
2021 arXiv   pre-print
We first argue that for ratios up to 2, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a patient notion of regret, might indeed still be unstable  ...  Stability with decentralized learning strategies with a ratio below 2 was a major remaining question.  ...  It is a particular instance of stochastic multi-armed bandits, a celebrated online learning model, where the agent repeatedly takes an action within a finite set and observes its associated reward.  ... 
arXiv:2106.04228v2 fatcat:45qxpqefpjfxlnzefkscv2lovi

Anti-Jamming Game to Combat Intelligent Jamming for Cognitive Radio Networks

Khalid Ibrahim, Soon Xin Ng, Ijaz Mansoor Qureshi, Aqdas Naveed Malik, Sami Muhaidat
2021 IEEE Access  
In jamming, an attacker jams the communication by transmitting a high power noise signal in the vicinity of the targeted node.  ...  We consider a realistic mathematical model, where the channel conditions are time-varying and differ from one sub-channel to another, as in practical scenarios.  ...  The authors formulated the problem of anti-jamming multi-channel access in CRN as a non stochastic multi-armed bandit problem, where both secondary sender and receiver chooses their common operating channels  ... 
doi:10.1109/access.2021.3117563 fatcat:w5z73pxcevbnpafekfbwaf3hne
« Previous Showing results 1 — 15 out of 30 results