A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
[article]
2019
arXiv
pre-print
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. ...
We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely T^1-1/2m where m is the number of players. ...
,K} i−1
m ,
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without Sébastien Bubeck Microsoft Research Yuanzhi Li *
By making even stronger ...
arXiv:1904.12233v2
fatcat:pv5u3qfa7jcp7elowk7w52rnya
On No-Sensing Adversarial Multi-player Multi-armed Bandits with Collision Communications
[article]
2021
arXiv
pre-print
We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. ...
without collision information in an adversarial environment. ...
In MP-MAB, multiple players simultaneously play the bandit game without explicit communications and interact with each other only through arm collisions. ...
arXiv:2011.01090v2
fatcat:s3anqbrcubhuvjm2lk7mxs44ay
Federated Multi-Armed Bandits
[article]
2021
arXiv
pre-print
Federated multi-armed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning. ...
We show that, somewhat surprisingly, the order-optimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity. ...
Nonstochastic multi-player multi-armed bandits: Optimal rate with collision information, sublinear without. In Conference on Learning Theory, 961-987. PMLR. ...
arXiv:2101.12204v2
fatcat:x56x4dej6rfwtjjagr6yqyc334
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players
[article]
2020
arXiv
pre-print
We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero ...
We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm ...
Multiplayer bandits without observing collision information. arXiv preprint arXiv:1808.08416. Magesh, A. and Veeravalli, V. V. (2019). ...
arXiv:1902.01239v4
fatcat:ohiwofaatfeynell7vwjptnjyu
Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions
[article]
2021
arXiv
pre-print
We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. ...
Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. ...
We further make an assumption 9 that the optimal arm-player assignment always have at most one player on each of the top-M arms without collision. ...
arXiv:2106.13669v1
fatcat:mtgir2yyo5e5lcki3wd3czxucm
Multi-Player Bandits: The Adversarial Case
[article]
2019
arXiv
pre-print
In this work, we design the first Multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary. ...
We consider a setting where multiple players sequentially choose among a common set of actions (arms). ...
Despite impressive progress on Multi-player Bandit problems, existing works only address the stochastic setting where the environment is stationary. ...
arXiv:1902.08036v1
fatcat:jmcbohkg55gdnasuejiz2umvlu
Decentralized Learning for Channel Allocation in IoT Networks over Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game
[article]
2021
arXiv
pre-print
Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. ...
They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. ...
(MP) Multi-Armed Bandits (MAB). ...
arXiv:2003.13314v3
fatcat:qygpagrcffeu5cbb7mqc7tgy6u
Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics
[article]
2011
arXiv
pre-print
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses M out of N arms to play at each time. ...
We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. ...
Restless Multi-Armed Bandit with Unknown Dynamics In this paper, we consider Restless Multi-Armed Bandit (RMAB), a generalization of the classic MAB. ...
arXiv:1011.4969v2
fatcat:vcwyqj7tbfduzdstpxm7luznia
Bandit Learning in Decentralized Matching Markets
[article]
2021
arXiv
pre-print
Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. ...
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences ...
Problem Setting We consider a multiplayer multi-armed bandit problem with N players and L stochastic arms, with N ≤ L. ...
arXiv:2012.07348v4
fatcat:r6p7qsxzlbfxpo3c3ondf62ec4
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation
[article]
2021
arXiv
pre-print
We formulate this problem as the ϵ-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players ...
In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information ...
In contrast, we study the multi-player setting, where all players learn continually and concurrently. Collisions in multi-player bandits. ...
arXiv:2010.15390v2
fatcat:obhmf2l7zvcjbns6ailmpbwt7u
On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits
[article]
2016
arXiv
pre-print
In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution. ...
We consider the problem of learning in single-player and multiplayer multiarmed bandit models. ...
This motivates the non-Bayesian setting. The
and multiplayer multi-armed bandit models. ...
arXiv:1505.00553v2
fatcat:7zbiwjvn6nhi3jsjqsfhma75gm
Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems
[article]
2015
arXiv
pre-print
We propose a learning algorithm with logarithmic regret uniformly over time with respect to the optimal finite horizon policy. Our results extend the optimal adaptive learning of MDPs to POMDPs. ...
In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. ...
RELATED WORK Related work on multi-armed bandit problems started with the seminal paper by Lai and Robbins [3] , where asymptotically optimal adaptive policies for arms with iid reward processes (referred ...
arXiv:1107.4042v3
fatcat:fxczzt4sl5aapfaeynsuatzp24
Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits
[article]
2018
arXiv
pre-print
We rely on Reinforcement Learning (RL), and more specifically on Multi-Armed Bandits (MABs), to allow networks to learn their best configuration. ...
Our results show that optimal proportional fairness can be achieved, even when no information about neighboring networks is available to the learners and Wireless Networks (WNs) operate selfishly. ...
and the non-stochastic (or adversarial ) bandit problem where the rewards are chosen arbitrarily by the environment. ...
arXiv:1710.11403v3
fatcat:b6cpppgdnze5vf7zkzpvc4glfu
Introduction to Multi-Armed Bandits
[article]
2022
arXiv
pre-print
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. ...
The chapters on "bandits with similarity information", "bandits with knapsacks" and "bandits and agents" can also be consumed as standalone surveys on the respective topics. ...
Chapter 4
Bandits with Similarity Information We consider stochastic bandit problems in which an algorithm has auxiliary information on similarity between arms. ...
arXiv:1904.07272v7
fatcat:pptyhyyshrdyhhf7bdonz5dsv4
Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics
2013
IEEE Transactions on Information Theory
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses one out of N arms to play at each time. ...
We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. ...
Multi-Armed Bandit with i.i.d. and Rested Markovian Reward ModelsIn the classic multi-armed bandit (MAB) with an i.i.d. reward model, there are N independent arms and a single player. ...
doi:10.1109/tit.2012.2230215
fatcat:rz73z6gqsjhalkco5mk5lmi5ny
« Previous
Showing results 1 — 15 out of 37 results