A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Bandits with concave rewards and convex knapsacks
2014
Proceedings of the fifteenth ACM conference on Economics and computation - EC '14
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al. [2013] . We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the
doi:10.1145/2600057.2602844
dblp:conf/sigecom/AgrawalD14
fatcat:jodo3ehfmjgwjfxjcbjxa6u3fm