A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem
2010
2010 Ninth International Conference on Machine Learning and Applications
Many sequential decision making problems require an agent to balance exploration and exploitation to maximise long-term reward. Existing policies that address this tradeoff typically have parameters that are set a priori to control the amount of exploration. In finite-time problems, the optimal values of these parameters are highly dependent on the problem faced. In this paper, we propose adapting the amount of exploration performed on-line, as information is gathered by the agent. To this end
doi:10.1109/icmla.2010.74
dblp:conf/icmla/SykulskiAJ10
fatcat:ayw4h7wjq5hqzaoulkacea4dpm