A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Serverless Straggler Mitigation using Local Error-Correcting Codes
[article]
2020
arXiv
pre-print
The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. ...
We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning ...
STRAGGLER RESILIENCE IN SERVERLESS COMPUTING USING CODES
A. ...
arXiv:2001.07490v1
fatcat:ptbzh4ld3jezphosqkylgpadni
OverSketched Newton: Fast Convex Optimization for Serverless Systems
[article]
2020
arXiv
pre-print
These sketching methods lead to inbuilt resiliency against stragglers that are a characteristic of serverless architectures. ...
Depending on whether the problem is strongly convex or not, we propose different iteration updates using the approximate Hessian. ...
For straggler mitigation during gradient calculation, we use the recently proposed technique based on error-correcting codes to create redundant computation [33] [34] [35] . ...
arXiv:1903.08857v3
fatcat:luuf7mcdm5dk3l4bhpmf4khmq4
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
[article]
2021
arXiv
pre-print
However, the quickly moving technology hinders reproducibility, and the lack of a standardized benchmarking suite leads to ad-hoc solutions and microbenchmarks being used in serverless research, further ...
To address this challenge, we propose the Serverless Benchmark Suite: the first benchmark for FaaS computing that systematically covers a wide spectrum of cloud resources and applications. ...
Serverless functions pose new challenges due to a lack of code and data locality. ...
arXiv:2012.14132v2
fatcat:i6hehq4zqfht3f6ovuljvkzvsu
numpywren: serverless linear algebra
[article]
2018
arXiv
pre-print
Linear algebra operations are widely used in scientific computing and machine learning applications. ...
We present numpywren, a system for linear algebra built on a serverless architecture. ...
Straggler Mitigation: The lease mechanism also enables straggler mitigation by default. If a worker stalls or is slow, it can fail to renew a lease before it expires. ...
arXiv:1810.09679v1
fatcat:lxv65s42rzd6vomkg2pff2akhm
Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
[article]
2019
arXiv
pre-print
Recently proposed fixed-rate erasure coding strategies can handle unpredictable node slowdown, but they ignore partial work done by straggling nodes thus resulting in a lot of redundant computation. ...
We conduct experiments in three computing environments: local parallel computing, Amazon EC2, and Amazon Lambda, which show that rateless coding gives as much as 3× speed-up over uncoded schemes. ...
The rows of A are encoded using an error correcting code to give the m e × n encoded matrix A e , where m e ≥ m. We denote the amount of redundancy added by the parameter α = m e /m. ...
arXiv:1804.10331v5
fatcat:46a5afq3bfcazas72bdrnehbwi
Cloudburst: Stateful Functions-as-a-Service
[article]
2020
arXiv
pre-print
Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. ...
We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. ...
We could not deploy SAND ourselves because the source code is unavailable, so we used a hosted offering [75] . ...
arXiv:2001.04592v2
fatcat:txil6f7nprcezlmc45qscw7voe
Distributed Averaging Methods for Randomized Second Order Optimization
[article]
2020
arXiv
pre-print
Additionally, we demonstrate the implications of our theoretical findings via large scale experiments performed on a serverless computing platform. ...
A possible solution to the issue of straggling nodes is to use error correcting codes to insert redundancy to computation and hence to avoid waiting for the outputs of all of the worker nodes (Lee et ...
We identify that implementing straggler mitigation for solving large scale problems via approximate second order optimization methods such as distributed Newton sketch is a promising direction. ...
arXiv:2002.06540v1
fatcat:d4jwvyj5s5gk7bpymr6tj7urca
Optimizing Prediction Serving on Low-Latency Serverless Dataflow
[article]
2020
arXiv
pre-print
We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend. ...
Similar to straggler mitigation in MapReduce [11] , competitive execution of inference has been shown to improve tail latencies for ML models [36, 53] . Operator Autoscaling and Placement. ...
Similar to our Sagemaker deployments, we use custom code to move each request through the pipeline. ...
arXiv:2007.05832v1
fatcat:m3q5zhbfsbgcdijtm7stgyko7u
Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds
[article]
2022
arXiv
pre-print
Additionally, we demonstrate the implications of our theoretical findings via large scale experiments on a serverless cloud computing platform. ...
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems. ...
A possible solution to the issue of straggling nodes is to use error correcting codes to insert redundancy to computation and hence to avoid waiting for the outputs of all of the worker nodes [41] , ...
arXiv:2203.09755v1
fatcat:7vwr4neazfg7higpowlhg2ztr4
Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads
[article]
2021
arXiv
pre-print
Just using Lambdas on top of CPU servers offers up to 2.75x more performance-per-dollar than training only with CPU servers. ...
Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. The key insight guiding our design is computation separation. ...
In this case, the GA task of v uses stale values from its neighbors (i.e., the same as what were used in the previous epoch). This would clearly generate large errors at the end of the epoch. ...
arXiv:2105.11118v2
fatcat:qzflntrg5bbqpfcmbu4xh6dpne
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
[article]
2020
arXiv
pre-print
Yet the underlying execution times are not fundamentally unpredictable - on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. ...
Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. ...
Individual prediction errors can compound, leading to increased completion time error. For INFER actions, the error compounds 4×, with a 99 th percentile completion error of ≈1 ms. ...
arXiv:2006.02464v2
fatcat:f7quwroge5hmxpw66a5oujqhhu
Triggerflow: Trigger-based Orchestration of Serverless Workflows
[article]
2020
pre-print
We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive schedulers (State Machines, Directed Acyclic Graphs, Workflow as code). ...
We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows built on top of Knative Eventing and Kubernetes technologies. ...
Furthermore, this approach gives us the opportunity to handle errors during a work ow runtime. ...
doi:10.1145/3401025.3401731
arXiv:2006.08654v1
fatcat:b5onuunnvfbnrmy3lyx36hh35i
Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration
[article]
2022
arXiv
pre-print
In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. ...
Consequently, MAML is limited to using very simple, shallow ANN architectures. To mitigate this issue, Rusu et al. ...
that are more likely to be stragglers in advance [15] . ...
arXiv:2205.01423v1
fatcat:ul24ibzts5eutoozdh4y3n5kgq
Multi-tenant mobile offloading systems for real-time computer vision applications
2019
Proceedings of the 20th International Conference on Distributed Computing and Networking - ICDCN '19
In pursuit
of low latency and high throughput, Clipper adopts caching, adaptive batching, and straggler
mitigation techniques. ...
It also presents straggler mitigation techniques to reduce the tail delay. These works do not consider co-locating RT and non-RT tasks, which is addressed in our work. ...
Implementing run method for object detection microservice using data parallelization OBJECT DET PARALLEL. 1 # A microservice for object detection using data parallelization . ...
doi:10.1145/3288599.3288634
dblp:conf/icdcn/FangLS019
fatcat:qpib2wkm7jdnfg7k64eh3hwje4
Hints and Principles for Computer System Design
[article]
2021
arXiv
pre-print
Two years later we built the Alto 2, using 4k RAM chips and error correction. ...
If is too big (perhaps because the chance of corrupting a message bit is too great), you can make it smaller with forward error correction (an error-correcting code). ...
arXiv:2011.02455v3
fatcat:jolyz5lknjdbpjpxjcrx5rh6fa
« Previous
Showing results 1 — 15 out of 20 results