A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Gradient Coding: Avoiding Stragglers in Distributed Learning
2017
International Conference on Machine Learning
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.
dblp:conf/icml/TandonLDK17
fatcat:3zj54ersbrervbbtxs4wbb2xwe