A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks. While most bandit algorithms are designed to have a low worst-case regret, we examine here the average regret over bandit instances drawn from some prior distribution which mayarXiv:2012.14264v1 fatcat:rycomukpefcurg6ccipuu2esqy