Reducing communication overhead in distributed learning by an order of magnitude (almost)

Anders Oland, Bhiksha Raj
2015 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Large-scale distributed learning plays an ever-more increasing role in modern computing. However, whether using a compute cluster with thousands of nodes, or a single multi-GPU machine, the most significant bottleneck is that of communication. In this work, we explore the effects of applying quantization and encoding to the parameters of distributed models. We show that, for a neural network, this can be done -without slowing down the convergence, or hurting the generalization of the model. In
more » ... n of the model. In fact, in our experiments we were able to reduce the communication overhead by nearly an order of magnitude -while actually improving the generalization accuracy.
doi:10.1109/icassp.2015.7178365 dblp:conf/icassp/OlandR15 fatcat:xrazf7pt4fastkonbscoejrsiy