A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks
[article]
2022
arXiv
pre-print
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate
arXiv:2204.07261v2
fatcat:mgjnsugs3vgv5fipymihg4xqf4