A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Near-optimal Representation Learning for Linear Bandits and Linear RL
[article]
2021
arXiv
pre-print
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k≪ d and k ≪ M. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M√(dkT) + d√(kMT) ) regret, with T being the number of total
arXiv:2102.04132v1
fatcat:xkp3fbqsorhsjde4x5pti4y7me