A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
2006
Journal of machine learning research
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O (n/ε 2 ) log(1/δ) times to find an ε-optimal arm with probability of at least 1 − δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is
dblp:journals/jmlr/Even-DarMM06
fatcat:sqxognjgarb6ze2ihm42vkpw3i