A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
[article]
2020
arXiv
pre-print
Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature. Towards answering this question, we evaluate excess risk of a deep learning estimator trained by a noisy gradient descent with ridge regularization on a mildly overparameterized neural network, and discuss its superiority to a class of linear estimators that includes neural tangent kernel approach, random feature
arXiv:2012.03224v1
fatcat:xawtcfmgvnfanavinz72j2cqg4