Lifelong Learning in Multi-Armed Bandits [article]

Matthieu Jedor, Jonathan Louëdec, Vianney Perchet
2020 arXiv   pre-print
Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks. While most bandit algorithms are designed to have a low worst-case regret, we examine here the average regret over bandit instances drawn from some prior distribution which may
more » ... over time. We specifically focus on confidence interval tuning of UCB algorithms. We propose a bandit over bandit approach with greedy algorithms and we perform extensive experimental evaluations in both stationary and non-stationary environments. We further apply our solution to the mortal bandit problem, showing empirical improvement over previous work.
arXiv:2012.14264v1 fatcat:rycomukpefcurg6ccipuu2esqy