A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in thearXiv:1711.07077v4 fatcat:2eqiwj6ljjdsle7svb74nbpj7u