A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Multi-player multi-armed bandits: Decentralized learning with IID rewards
2012
2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
We consider the decentralized multi-armed bandit problem with distinct arms for each players. Each player can pick one arm at each time instant and can get a random reward from an unknown distribution with an unknown mean. The arms give different rewards to different players. If more than one player select the same arm, everyone gets a zero reward. There is no dedicated control channel for communication or coordination among the user. We propose an online learning algorithm called dUCB4 which
doi:10.1109/allerton.2012.6483307
dblp:conf/allerton/KalathilNJ12
fatcat:ckg7wlgdcfbw5hpmrtraej7zg4