Rates of Convergence for Sparse Variational Gaussian Process Regression [article]

David Burt, Carl Rasmussen, Mark Van Der Wilk, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the O(N³) scaling with dataset size N. They reduce the computational cost to O(NM²), with M≪N being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in N, the true complexity of the algorithm depends on how M must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on
more » ... he KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing M more slowly than N. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, M = O(log^DN) is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase M in continual learning scenarios.
doi:10.17863/cam.45147 fatcat:kpdhvocpt5bgvowr3ern36hcre