1 Hit in 0.82 sec

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks [article]

Yushu Chen, Hao Jing, Wenlai Zhao, Zhiqiang Liu, Ouyi Li, Liang Qiao, Wei Xue, Guangwen Yang
2020 arXiv   pre-print
RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement.  ...  In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.  ...  Introduction and related work First-order optimization methods e.g. [30, 28, 2, 31, 15] are competitive workhorses for training neural networks. Compared with second-order methods e.g.  ... 
arXiv:1905.01422v8 fatcat:7l5cewu2dne4rg3xujhqf3ad7q