2 Hits in 3.1 sec

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training [article]

Alessandro Rigazzi
2019 arXiv   pre-print
In this work we propose DC-S3GD, a decentralized (without Parameter Server) stale-synchronous version of the Delay-Compensated Asynchronous Stochastic Gradient Descent (DC-ASGD) algorithm.  ...  Data parallelism has become the de facto standard for training Deep Neural Network on multiple processing units.  ...  In recent years, large-scale training was obtained by using different flavors of the most classic synchronous scheme, that is Synchronous SGD, in conjunction with decentralized communication.  ... 
arXiv:1911.02516v1 fatcat:uev2oh4qjref7kezoxj4bsepxa

Adaptive Braking for Mitigating Gradient Delay [article]

Abhinav Venigalla and Atli Kosson and Vitaliy Chiley and Urs Köster
2020 arXiv   pre-print
Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel.  ...  We show that applying AB on top of SGD with momentum enables training ResNets on CIFAR-10 and ImageNet-1k with delays D ≥ 32 update steps with minimal drop in final test accuracy.  ...  Acknowledgements We thank Joel Hestness, Vithursan Thangarasa, and Xin Wang for for their help and feedback that improved the manuscript.  ... 
arXiv:2007.01397v2 fatcat:wglfxomtn5hjpm67zjlkfa2m7q