Near-optimal Representation Learning for Linear Bandits and Linear RL [article]

Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
2021 arXiv   pre-print
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k≪ d and k ≪ M. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M√(dkT) + d√(kMT) ) regret, with T being the number of total
more » ... . Our regret significantly improves upon the baseline Õ(Md√(T)) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error . To the best of our knowledge, this is the first theoretical result that characterizes the benefits of multi-task representation learning for exploration in RL with function approximation.
arXiv:2102.04132v1 fatcat:xkp3fbqsorhsjde4x5pti4y7me