A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey of Coded Distributed Computing
[article]
2020
arXiv
pre-print
Distributed computing has become a common approach for large-scale computation of tasks due to benefits such as high reliability, scalability, computation speed, and costeffectiveness. ...
Then, we review and analyze a number of CDC approaches proposed to reduce the communication costs, mitigate the straggler effects, and guarantee privacy and security. ...
The cluster of computers is modelled as a master-worker system which consists of a single master node and multiple workers to store and analyzes massive amount of unstructured data. ...
arXiv:2008.09048v1
fatcat:riy4dxvuc5ae3krz7lf25zkg6m
Train Where the Data is: A Case for Bandwidth Efficient Coded Training
[article]
2019
arXiv
pre-print
Furthermore, coded computing traditionally relied on a central node to encode and distribute data to all the worker nodes, which is not practical in a distributed mobile setting. ...
But there is a growing interest in enabling training near the data. For instance, mobile devices are rich sources of training data. ...
Straggler Mitigation: Straggler mitigation in distributed computing has received considerable attention and many techniques have been proposed in the literature. ...
arXiv:1910.10283v1
fatcat:5u5evimcbjemrancv2bl6uhiby
Serverless Straggler Mitigation using Local Error-Correcting Codes
[article]
2020
arXiv
pre-print
We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning ...
On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability ...
In Fig. 4 , for example, only two blocks need to be read to mitigate a straggler. ...
arXiv:2001.07490v1
fatcat:ptbzh4ld3jezphosqkylgpadni
Coded Computation over Heterogeneous Clusters
[article]
2019
arXiv
pre-print
We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. ...
There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks ...
The work in [16] proposes coding schemes for mitigating stragglers in distributed batch gradient computation. ...
arXiv:1701.05973v5
fatcat:wan745p6pbdbldnksc4ifn7bba
Efficient Replication for Straggler Mitigation in Distributed Computing
[article]
2020
arXiv
pre-print
Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. ...
Finally, by running experiments on Google cluster traces, we observe that redundancy can reduce the compute time of the jobs in Google clusters by an order of magnitude, and that the optimum level of redundancy ...
ACKNOWLEDGEMENT This research was supported in part by the NSF awards No. CIF-1717314 and CCF-1559855. ...
arXiv:2006.02318v2
fatcat:5vmx235oangghmu6uvoj4h7jpi
Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange
[article]
2017
arXiv
pre-print
levels is not available. ...
We then present our approach of work exchange to combat the latency problem, in which faster workers can be reassigned additional leftover computations that were originally assigned to slower workers. ...
At the core of the straggler problem is the heterogeneity of computation across the workers, i.e., different workers in the cluster may have different computational capabilities. ...
arXiv:1711.08452v1
fatcat:nezkg4w6tbdb7ajw6we5fe7bqa
Robust Gradient Descent via Moment Encoding with LDPC Codes
[article]
2019
arXiv
pre-print
To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. ...
The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system ...
Acknowledgements This work is supported in part by National Science Foundation awards CCF 1642658 (CAREER) and CCF 1618512. ...
arXiv:1805.08327v2
fatcat:bghtp26hhjbutjnxx6jffb3e5q
Addressing the straggler problem for iterative convergent parallel ML
2016
Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16
FlexRR provides a scalable, efficient solution to the straggler problem for iterative machine learning (ML). ...
., per iteration) barriers used in traditional BSP-based distributed ML implementations cause every transient slowdown of any worker thread to delay all others. ...
This research is supported in part by Intel as part of the Intel Science and Technology Center for Cloud Computing (ISTC-CC), National Science Foundation under awards CNS-1042537, CCF-1533858, CNS-1042543 ...
doi:10.1145/2987550.2987554
dblp:conf/cloud/HarlapCDWGGGX16
fatcat:ajh5kcppyrhxpnoqrmybkkly2i
Latency Analysis of Coded Computation Schemes over Wireless Networks
[article]
2017
arXiv
pre-print
In particular, optimal coding schemes for minimizing latency in distributed computation of linear functions and mitigating the effect of stragglers was proposed for a wired network, where the workers can ...
In this paper, we focus on the problem of coded computation over a wireless master-worker setup with straggling workers, where only one worker can transmit the result of its local computation back to the ...
The traditional approach for mitigating these bottlenecks is to introduce computation redundancy in the form of task replicas. ...
arXiv:1707.00040v1
fatcat:ibkycb2rtrfhziwyv66tothr6a
Mitigate data skew caused stragglers through ImKP partition in MapReduce
2017
2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC)
Speculative execution is the mechanism adopted by current MapReduce framework when dealing with the straggler problem, and it functions through creating redundant copies for identified stragglers. ...
In this paper, we focus on mitigating data skew caused Reduce stragglers, propose ImKP, an Intermediate Key Pre-processing framework that enables the even distributed partition for Reduce inputs. ...
For Reduce skew handling approaches, Co-worker [10] functions in a way that as long as a straggler is identified, the reserved co-worker task will help process the remaining data. ...
doi:10.1109/pccc.2017.8280475
dblp:conf/ipccc/OuyangZCTX17
fatcat:ory4bvmpofdpfnzxnjcjumut2i
Existing modeling-based approaches are hard to rely on for production-level adoption due to modeling errors. We present Wrangler, a system that proactively avoids situations that cause stragglers. ...
For production-level workloads from Facebook and Cloudera's customers, Wrangler improves the 99 th percentile job completion time by up to 61% as compared to speculative execution, a widely used straggler ...
We also thank our shepherd, Fred Douglis, for help in shaping the final version of the paper. ...
doi:10.1145/2670979.2671005
dblp:conf/cloud/YadwadkarAK14
fatcat:rfwiymfinrdj3iq4446ogppvee
Speculative pipelining for compute cloud programming
2010
2010 - MILCOM 2010 MILITARY COMMUNICATIONS CONFERENCE
compute clouds. ...
These phases can experience unpredictable delays when available computing and network capacities fluctuate or when there are large disparities in inter-node communication delays, as can occur on shared ...
Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. ...
doi:10.1109/milcom.2010.5680451
fatcat:7ulyy646ljacjm4sapbtxbhqtu
ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding
[article]
2019
arXiv
pre-print
Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ...
We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. ...
and AWS Cloud Credits for Research from Amazon. ...
arXiv:1901.09671v1
fatcat:lkhtdq5lhjb3phyxswv5jblbem
Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
[article]
2018
arXiv
pre-print
We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left ...
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. ...
Acknowledgments The work of Can Karakus and Suhas Diggavi was supported in part by NSF grants #1314937 and #1514531. ...
arXiv:1803.05397v1
fatcat:s7773b2nunbsnf6bpsyzazbuty
Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning
[article]
2022
arXiv
pre-print
AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. ...
Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. ...
The workers must remain oblivious to the
mitigating straggler effects and for tackling Byzantine nodes. ...
arXiv:2107.12958v2
fatcat:f4zr6cymjray3mwdcin2dsyqoi
« Previous
Showing results 1 — 15 out of 99 results