Filters








298 Hits in 3.6 sec

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning [article]

Rawad Bitar, Mary Wootters, Salim El Rouayheb
2019 arXiv   pre-print
In this work we propose an approximate gradient coding scheme called Stochastic Gradient Coding (SGC), which works when the stragglers are random.  ...  We consider distributed gradient descent in the presence of stragglers.  ...  ACKNOWLEDGEMENTS We thank Deanna Needell for helpful pointers to the literature.  ... 
arXiv:1905.05383v1 fatcat:hbbegyseyjgnvjvhb5tqwj5qsy

Distributed Stochastic Gradient Descent Using LDGM Codes [article]

Shunsuke Horii, Takahiro Yoshida, Manabu Kobayashi, Toshiyasu Matsushima
2019 arXiv   pre-print
Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning has been established by Tandon et al.  ...  On the other hand, if the Stochastic Gradient Descent (SGD) algorithm is used, it is not necessary to completely recover the gradient information, and its unbiased estimator is sufficient for the learning  ...  This research is partially supported by No. 16K00417 of Grant-in-Aid for Scientific Research Category (C) and No. 18H03642 of Grant-in-Aid for Scientific Research Category (A), Japan Society for the Promotion  ... 
arXiv:1901.04668v1 fatcat:cwrjcs3lija2zoccknceovbqp4

Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [article]

Hao Chen, Yu Ye, Ming Xiao, Mikael Skoglund, H. Vincent Poor
2020 arXiv   pre-print
To address two main critical challenges in distributed networks, i.e., communication bottleneck and straggler nodes (nodes with slow responses), error-control-coding based stochastic incremental ADMM is  ...  A class of mini-batch stochastic alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.  ...  For instance, in [23] , gradient coding (GC) based on maximum distance separable (MDS) codes was first proposed to mitigate the effect of stragglers in distributed GD.  ... 
arXiv:2010.00914v1 fatcat:o7oy4w4hznehtok35l546kard4

Distributed Optimization using Heterogeneous Compute Systems [article]

Vineeth S
2021 arXiv   pre-print
A naive implementation of synchronous distributed training will result in the faster workers waiting for the slowest worker to complete processing.  ...  Code is available at the repository: .  ...  Acknowledgments The author would like to thank Himanshu Tyagi for suggesting the problem, useful discussions, and guidance throughout this work.  ... 
arXiv:2110.08941v1 fatcat:yqtgyepicnh5toxzggixxvslrq

Gradient Coding via the Stochastic Block Model [article]

Zachary Charles, Dimitris Papailiopoulos
2018 arXiv   pre-print
Gradient coding is a new technique for mitigating the effect of stragglers via algorithmic redundancy.  ...  In this work, we present the stochastic block code (SBC), a gradient code based on the stochastic block model.  ...  Gradient coding, a straggler-mitigating technique for distributed gradient-based methods, was first proposed in [8] and was later extended in [9] .  ... 
arXiv:1805.10378v1 fatcat:fyoxrnqelreexg3ywfhdegxgvm

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees [article]

Yuchang Sun and Jiawei Shao and Songze Li and Yuyi Mao and Jun Zhang
2022 arXiv   pre-print
This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL).  ...  In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates  ...  CONCLUSIONS In this paper, we proposed a novel algorithm to alleviate the straggler issue in federated learning, namely stochastic coded federated learning (SCFL).  ... 
arXiv:2201.10092v5 fatcat:p5ffttpt4bg5fib5da6yhyie3y

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding [article]

Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
2019 arXiv   pre-print
We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.  ...  Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes.  ...  and AWS Cloud Credits for Research from Amazon.  ... 
arXiv:1901.09671v1 fatcat:lkhtdq5lhjb3phyxswv5jblbem

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning [article]

Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran
2020 arXiv   pre-print
In this paper, we develop a communication-efficient gradient coding framework to overcome these drawbacks.  ...  Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines  ...  Using coding-theoretic ideas to mitigate stragglers has gained significant research attention, see, e.g., [3] - [6] for distributed computing, and [7] - [18] for distributed learning.  ... 
arXiv:2005.07184v1 fatcat:izqx26qvwzbclhl2vyg7ifeh7i

Gradient Coding: Avoiding Stragglers in Distributed Learning

Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis
2017 International Conference on Machine Learning  
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning.  ...  We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent.  ...  NK thanks Paul Mineiro for insightful discussions.  ... 
dblp:conf/icml/TandonLDK17 fatcat:3zj54ersbrervbbtxs4wbb2xwe

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers [article]

Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran
2019 arXiv   pre-print
Gradient coding is a coding-theoretic framework to mitigate stragglers by enabling the server to recover the gradient sum in the presence of stragglers.  ...  Our motivation for constructing codes to mitigate adversarial stragglers stems from the challenge of tackling stragglers in massive-scale elastic and serverless systems, wherein it is difficult to statistically  ...  A coding theoretic framework for mitigating stragglers in distributed gradient-based learning methods was first proposed in [3] . The setup consists of N worker machines and a parameter server.  ... 
arXiv:1904.13373v1 fatcat:rnajawqdxbgl5akyxsgj4grja4

OverSketched Newton: Fast Convex Optimization for Serverless Systems [article]

Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran
2020 arXiv   pre-print
For both cases, we establish convergence guarantees for OverSketched Newton and empirically validate our results by solving large-scale supervised learning problems on real-world datasets.  ...  Experiments demonstrate a reduction of ~50% in total running time on AWS Lambda, compared to state-of-the-art distributed optimization schemes.  ...  The authors would like to additionally thank Fred-Roosta and Yang Liu for helpful discussions regarding our proof techniques and AWS for providing promotional cloud credits for research.  ... 
arXiv:1903.08857v3 fatcat:luuf7mcdm5dk3l4bhpmf4khmq4

Communication-Efficient Edge AI: Algorithms and Systems [article]

Yuanming Shi, Kai Yang, Tao Jiang, Jun Zhang, Khaled B. Letaief
2020 arXiv   pre-print
This is driven by the explosive growth of data, advances in machine learning (especially deep learning), and easy access to vastly powerful computing resources.  ...  In this paper, we present a comprehensive survey of the recent developments in various techniques for overcoming these communication challenges.  ...  Zhi Ding from the University of California at Davis for insightful and constructive comments to improve the presentation of this work.  ... 
arXiv:2002.09668v1 fatcat:nhasdzb7t5dt5brs2r7ocdzrnm

Coding for Large-Scale Distributed Machine Learning

Ming Xiao, Mikael Skoglund
2022 Entropy  
For large-scale distributed learning systems, significant challenges have appeared in terms of delay, errors, efficiency, etc.  ...  Then, we introduce random coding for gradient-based DML.  ...  In such scenario, gradient coding [34] can be used to correct the straggler nodes.  ... 
doi:10.3390/e24091284 fatcat:ul4lu6xty5cwbnsop5ccv7ns64

Optimization-based Block Coordinate Gradient Coding [article]

Qi Wang, Ying Cui, Chenglin Li, Junni Zou, Hongkai Xiong
2021 arXiv   pre-print
This paper considers a distributed computation system consisting of one master and N workers characterized by a general partial straggler model and focuses on solving a general large-scale machine learning  ...  Existing gradient coding schemes introduce identical redundancy across the coordinates of gradients and hence cannot fully utilize the computation results from partial stragglers.  ...  Recently, several coding-based distributed computation techniques have been developed to mitigate the effect of stragglers in training the model via gradient descent algorithms.  ... 
arXiv:2109.08933v1 fatcat:4lfpbhaep5gxffhrignjv2ow3m

Robust Gradient Descent via Moment Encoding with LDPC Codes [article]

Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
2019 arXiv   pre-print
We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method.  ...  This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of straggling processors.  ...  Acknowledgements This work is supported in part by National Science Foundation awards CCF 1642658 (CAREER) and CCF 1618512.  ... 
arXiv:1805.08327v2 fatcat:bghtp26hhjbutjnxx6jffb3e5q
« Previous Showing results 1 — 15 out of 298 results