Multi-player multi-armed bandits: Decentralized learning with IID rewards

Dileep Kalathil, Naumaan Nayyar, Rahul Jain
2012 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)  
We consider the decentralized multi-armed bandit problem with distinct arms for each players. Each player can pick one arm at each time instant and can get a random reward from an unknown distribution with an unknown mean. The arms give different rewards to different players. If more than one player select the same arm, everyone gets a zero reward. There is no dedicated control channel for communication or coordination among the user. We propose an online learning algorithm called dUCB4 which
more » ... hieves a near-O(log 2 T ). The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users.
doi:10.1109/allerton.2012.6483307 dblp:conf/allerton/KalathilNJ12 fatcat:ckg7wlgdcfbw5hpmrtraej7zg4