A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Global Momentum Compression for Sparse Communication in Distributed SGD
[article]
2019
arXiv
pre-print
With the rapid growth of data, distributed stochastic gradient descent (DSGD) has been widely used for solving large-scale machine learning problems. Due to the latency and limited bandwidth of network, communication has become the bottleneck of DSGD when we need to train large scale models, like deep neural networks. Communication compression with sparsified gradient, abbreviated as sparse communication, has been widely used for reducing communication cost in DSGD. Recently, there has appeared
arXiv:1905.12948v2
fatcat:35bm5h5htnctfnvlaa2vur5rni