How isotropic kernels perform on simple invariants [article]

Jonas Paccolat, Stefano Spigler, Matthieu Wyart
2020 arXiv   pre-print
We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on d_∥ variables, fewer than the input dimension d. We compute the expected test error ϵ that follows ϵ∼ p^-β where p is the size of the training set. We find that β∼ 1/d independently of d_∥, supporting previous findings that the presence of invariants
more » ... es not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the stripe model where the data label depends on a single coordinate y(x) = y(x_1), corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, β = d-1+ξ/3d-3+ξ, where ξ∈ (0,2) is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since β→ 1 / 3 as d→∞. (iii) We confirm these findings for the spherical model for which y(x) = y(|x|). (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor λ (an operation believed to take place in deep networks), the test error is reduced by a factor λ^-2(d-1)/3d-3+ξ.
arXiv:2006.09754v5 fatcat:zh6n4tjzkbdprohbrojpoi2btu