Incentivising Exploration and Recommendations for Contextual Bandits with Payments [article]

Priyank Agrawal, Theja Tulabandhula
2020 arXiv   pre-print
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in
more » ... is domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.
arXiv:2001.07853v1 fatcat:7ed2qdnxrze2bmna7ezzijo25e