A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Multi-Feedback Bandit Learning with Probabilistic Contexts
2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Contextual bandit is a classic multi-armed bandit setting, where side information (i.e., context) is available before arm selection. A standard assumption is that exact contexts are perfectly known prior to arm selection and only single feedback is returned. In this work, we focus on multi-feedback bandit learning with probabilistic contexts, where a bundle of contexts are revealed to the agent along with their corresponding probabilities at the beginning of each round. This models such
doi:10.24963/ijcai.2020/423
dblp:conf/ijcai/HuangXWFY20
fatcat:ponhjxdxfjhlhjbaqyque2synm