A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
[article]
2018
arXiv
pre-print
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work we present a novel theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The
arXiv:1803.01113v3
fatcat:tehvwbmi6bffhi2hfvv7zvqe5e