A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training
[article]
2019
arXiv
pre-print
In this work we propose DC-S3GD, a decentralized (without Parameter Server) stale-synchronous version of the Delay-Compensated Asynchronous Stochastic Gradient Descent (DC-ASGD) algorithm. ...
Data parallelism has become the de facto standard for training Deep Neural Network on multiple processing units. ...
In recent years, large-scale training was obtained by using different flavors of the most classic synchronous scheme, that is Synchronous SGD, in conjunction with decentralized communication. ...
arXiv:1911.02516v1
fatcat:uev2oh4qjref7kezoxj4bsepxa
Adaptive Braking for Mitigating Gradient Delay
[article]
2020
arXiv
pre-print
Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel. ...
We show that applying AB on top of SGD with momentum enables training ResNets on CIFAR-10 and ImageNet-1k with delays D ≥ 32 update steps with minimal drop in final test accuracy. ...
Acknowledgements We thank Joel Hestness, Vithursan Thangarasa, and Xin Wang for for their help and feedback that improved the manuscript. ...
arXiv:2007.01397v2
fatcat:wglfxomtn5hjpm67zjlkfa2m7q