A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension
[article]
2022
arXiv
pre-print
We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem;
arXiv:1905.10395v5
fatcat:2ygkz5ewzzbi7amb67cxq7ufju