Filters








20 Hits in 3.5 sec

Serverless Straggler Mitigation using Local Error-Correcting Codes [article]

Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan Ramchandran
2020 arXiv   pre-print
The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers.  ...  We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning  ...  STRAGGLER RESILIENCE IN SERVERLESS COMPUTING USING CODES A.  ... 
arXiv:2001.07490v1 fatcat:ptbzh4ld3jezphosqkylgpadni

OverSketched Newton: Fast Convex Optimization for Serverless Systems [article]

Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran
2020 arXiv   pre-print
These sketching methods lead to inbuilt resiliency against stragglers that are a characteristic of serverless architectures.  ...  Depending on whether the problem is strongly convex or not, we propose different iteration updates using the approximate Hessian.  ...  For straggler mitigation during gradient calculation, we use the recently proposed technique based on error-correcting codes to create redundant computation [33] [34] [35] .  ... 
arXiv:1903.08857v3 fatcat:luuf7mcdm5dk3l4bhpmf4khmq4

SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing [article]

Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, Torsten Hoefler
2021 arXiv   pre-print
However, the quickly moving technology hinders reproducibility, and the lack of a standardized benchmarking suite leads to ad-hoc solutions and microbenchmarks being used in serverless research, further  ...  To address this challenge, we propose the Serverless Benchmark Suite: the first benchmark for FaaS computing that systematically covers a wide spectrum of cloud resources and applications.  ...  Serverless functions pose new challenges due to a lack of code and data locality.  ... 
arXiv:2012.14132v2 fatcat:i6hehq4zqfht3f6ovuljvkzvsu

numpywren: serverless linear algebra [article]

Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, Jonathan Ragan-Kelley
2018 arXiv   pre-print
Linear algebra operations are widely used in scientific computing and machine learning applications.  ...  We present numpywren, a system for linear algebra built on a serverless architecture.  ...  Straggler Mitigation: The lease mechanism also enables straggler mitigation by default. If a worker stalls or is slow, it can fail to renew a lease before it expires.  ... 
arXiv:1810.09679v1 fatcat:lxv65s42rzd6vomkg2pff2akhm

Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication [article]

Ankur Mallick, Malhar Chaudhari, Utsav Sheth, Ganesh Palanikumar, Gauri Joshi
2019 arXiv   pre-print
Recently proposed fixed-rate erasure coding strategies can handle unpredictable node slowdown, but they ignore partial work done by straggling nodes thus resulting in a lot of redundant computation.  ...  We conduct experiments in three computing environments: local parallel computing, Amazon EC2, and Amazon Lambda, which show that rateless coding gives as much as 3× speed-up over uncoded schemes.  ...  The rows of A are encoded using an error correcting code to give the m e × n encoded matrix A e , where m e ≥ m. We denote the amount of redundancy added by the parameter α = m e /m.  ... 
arXiv:1804.10331v5 fatcat:46a5afq3bfcazas72bdrnehbwi

Cloudburst: Stateful Functions-as-a-Service [article]

Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Jose M. Faleiro, Joseph E. Gonzalez, Joseph M. Hellerstein, Alexey Tumanov
2020 arXiv   pre-print
Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular.  ...  We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms.  ...  We could not deploy SAND ourselves because the source code is unavailable, so we used a hosted offering [75] .  ... 
arXiv:2001.04592v2 fatcat:txil6f7nprcezlmc45qscw7voe

Distributed Averaging Methods for Randomized Second Order Optimization [article]

Burak Bartan, Mert Pilanci
2020 arXiv   pre-print
Additionally, we demonstrate the implications of our theoretical findings via large scale experiments performed on a serverless computing platform.  ...  A possible solution to the issue of straggling nodes is to use error correcting codes to insert redundancy to computation and hence to avoid waiting for the outputs of all of the worker nodes (Lee et  ...  We identify that implementing straggler mitigation for solving large scale problems via approximate second order optimization methods such as distributed Newton sketch is a promising direction.  ... 
arXiv:2002.06540v1 fatcat:d4jwvyj5s5gk7bpymr6tj7urca

Optimizing Prediction Serving on Low-Latency Serverless Dataflow [article]

Vikram Sreekanti, Harikaran Subbaraj, Chenggang Wu, Joseph E. Gonzalez, Joseph M. Hellerstein
2020 arXiv   pre-print
We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend.  ...  Similar to straggler mitigation in MapReduce [11] , competitive execution of inference has been shown to improve tail latencies for ML models [36, 53] . Operator Autoscaling and Placement.  ...  Similar to our Sagemaker deployments, we use custom code to move each request through the pipeline.  ... 
arXiv:2007.05832v1 fatcat:m3q5zhbfsbgcdijtm7stgyko7u

Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds [article]

Burak Bartan, Mert Pilanci
2022 arXiv   pre-print
Additionally, we demonstrate the implications of our theoretical findings via large scale experiments on a serverless cloud computing platform.  ...  We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.  ...  A possible solution to the issue of straggling nodes is to use error correcting codes to insert redundancy to computation and hence to avoid waiting for the outputs of all of the worker nodes [41] ,  ... 
arXiv:2203.09755v1 fatcat:7vwr4neazfg7higpowlhg2ztr4

Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads [article]

John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, Guoqing Harry Xu
2021 arXiv   pre-print
Just using Lambdas on top of CPU servers offers up to 2.75x more performance-per-dollar than training only with CPU servers.  ...  Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. The key insight guiding our design is computation separation.  ...  In this case, the GA task of v uses stale values from its neighbors (i.e., the same as what were used in the previous epoch). This would clearly generate large errors at the end of the epoch.  ... 
arXiv:2105.11118v2 fatcat:qzflntrg5bbqpfcmbu4xh6dpne

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up [article]

Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace
2020 arXiv   pre-print
Yet the underlying execution times are not fundamentally unpredictable - on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance.  ...  Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times.  ...  Individual prediction errors can compound, leading to increased completion time error. For INFER actions, the error compounds 4×, with a 99 th percentile completion error of ≈1 ms.  ... 
arXiv:2006.02464v2 fatcat:f7quwroge5hmxpw66a5oujqhhu

Triggerflow: Trigger-based Orchestration of Serverless Workflows [article]

Pedro García-López, Aitor Arjona, Josep Sampe, Aleksander Slominski, Lionel Villard
2020 pre-print
We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive schedulers (State Machines, Directed Acyclic Graphs, Workflow as code).  ...  We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows built on top of Knative Eventing and Kubernetes technologies.  ...  Furthermore, this approach gives us the opportunity to handle errors during a work ow runtime.  ... 
doi:10.1145/3401025.3401731 arXiv:2006.08654v1 fatcat:b5onuunnvfbnrmy3lyx36hh35i

Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration [article]

Henna Kokkonen, Lauri Lovén, Naser Hossein Motlagh, Juha Partala, Alfonso González-Gil, Ester Sola, Iñigo Angulo, Madhusanka Liyanage, Teemu Leppänen, Tri Nguyen, Panos Kostakos, Mehdi Bennis (+4 others)
2022 arXiv   pre-print
In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration.  ...  Consequently, MAML is limited to using very simple, shallow ANN architectures. To mitigate this issue, Rusu et al.  ...  that are more likely to be stragglers in advance [15] .  ... 
arXiv:2205.01423v1 fatcat:ul24ibzts5eutoozdh4y3n5kgq

Multi-tenant mobile offloading systems for real-time computer vision applications

Zhou Fang, Jeng-Hau Lin, Mani B. Srivastava, Rajesh K. Gupta
2019 Proceedings of the 20th International Conference on Distributed Computing and Networking - ICDCN '19  
In pursuit of low latency and high throughput, Clipper adopts caching, adaptive batching, and straggler mitigation techniques.  ...  It also presents straggler mitigation techniques to reduce the tail delay. These works do not consider co-locating RT and non-RT tasks, which is addressed in our work.  ...  Implementing run method for object detection microservice using data parallelization OBJECT DET PARALLEL. 1 # A microservice for object detection using data parallelization .  ... 
doi:10.1145/3288599.3288634 dblp:conf/icdcn/FangLS019 fatcat:qpib2wkm7jdnfg7k64eh3hwje4

Hints and Principles for Computer System Design [article]

Butler Lampson
2021 arXiv   pre-print
Two years later we built the Alto 2, using 4k RAM chips and error correction.  ...  If is too big (perhaps because the chance of corrupting a message bit is too great), you can make it smaller with forward error correction (an error-correcting code).  ... 
arXiv:2011.02455v3 fatcat:jolyz5lknjdbpjpxjcrx5rh6fa
« Previous Showing results 1 — 15 out of 20 results