Combinatorial Multi-armed Bandits in Competitive Environments

Jinhang Zuo
Multi-armed bandits (MAB) have attracted much attention as a means of capturing the exploration and exploitation tradeoff in sequential decision making. In the classical MAB problem, at each round, a player chooses one arm from a fixed arm set and receives a random reward based on an unknown distribution. Nevertheless, in many real world applications, the problems have a combinatorial nature among multiple arms and possibly non-linear reward functions. Combinatorial multi-armed bandits (CMAB)
more » ... ve been extensively studied for these settings, and most previous works consider CMAB from a single player's perspective: at each round, one player chooses a set of arms to play, observes the feedback from them and receives a reward. However, motivated by applications such as online advertising (i.e., advertisers put ads on websites to attract user clicks), there might exist multiple players (advertisers) competing over the same set of arms (websites). This competition among players has been less studied and brings significant challenges to the design and analysis of bandit algorithms. In this thesis, we introduce the competitive CMAB problem from two different perspectives. We first consider competitive CMAB from the follower's perspective, where a follower and a competitor play with the same set of arms. We assume the follower can choose his action after observing the action of the competitor and study how the follower can maximize his own reward given the competitor's actions. We then introduce competitive CMAB from the multi-players' perspective, where multiple players choose combinatorial actions on the same set of arms. Our objective is to design bandit algorithms that maximize the collective reward across all players. We provide general formulations of both settings and design bandit algorithms with theoretical guarantees for real-world applications, including social influence maximization, dynamic channel allocation, and general resource allocation.
doi:10.1184/r1/21441456.v1 fatcat:5jdw74dgz5duppzhsh46bwjvtq