An Information-Theoretic Analysis of Thompson Sampling [article]

Daniel Russo, Benjamin Van Roy
2015 arXiv   pre-print
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
arXiv:1403.5341v2 fatcat:ilkglh4n2jacrbcobximevgme4