Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks [article]

Rama Cont, Alain Rossier, RenYuan Xu
2022 arXiv   pre-print
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate
more » ... relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.
arXiv:2204.07261v2 fatcat:mgjnsugs3vgv5fipymihg4xqf4