A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks
[article]
2020
arXiv
pre-print
RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. ...
In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization. ...
Introduction and related work First-order optimization methods e.g. [30, 28, 2, 31, 15] are competitive workhorses for training neural networks. Compared with second-order methods e.g. ...
arXiv:1905.01422v8
fatcat:7l5cewu2dne4rg3xujhqf3ad7q