1,001 Hits in 6.5 sec

No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy [article]

Amit Samanta and Suhas Shrinivasan and Antoine Kaufmann and Jonathan Mace
2019 arXiv   pre-print
With the rise of machine learning, inference on deep neural networks (DNNs) has become a core building block on the critical path for many cloud applications.  ...  We argue that DNN inference is an ideal candidate for a multi-tenant system because of its narrow and well-defined interface and predictable resource requirements.  ...  Overall, the total inference latency will depend on a combination of execution latency (CPU, GPU, or other accelerator) and transfer latency (PCIe, disk, and/or network).  ... 
arXiv:1901.06887v2 fatcat:k6bgjy7m3vcvrog5skrh55bzza

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models [article]

Matthew LeMay and Shijian Li and Tian Guo
2020 arXiv   pre-print
To accommodate high inference throughput, it is common to host a single pre-trained Convolutional Neural Network (CNN) in dedicated cloud-based servers with hardware accelerators such as Graphics Processing  ...  In this paper, we present Perseus, a measurement framework that provides the basis for understanding the performance and cost trade-offs of multi-tenant model serving.  ...  ACKNOWLEDGMENT The authors would like to thank National Science Foundation grants #1755659 and #1815619, and Google Cloud Platform Research credits.  ... 
arXiv:1912.02322v2 fatcat:f3tvqccovje4xnjz5bjzj7eiwy

AI on the Edge: Rethinking AI-based IoT Applications Using Specialized Edge Architectures [article]

Qianlin Liang, Prashant Shenoy, David Irwin
2020 arXiv   pre-print
We find that edge accelerators can support varying degrees of concurrency for multi-tenant inference applications, but lack isolation mechanisms necessary for edge cloud multi-tenant hosting.  ...  The attractiveness of edge computing has been further enhanced due to the recent availability of special-purpose hardware to accelerate specific compute tasks, such as deep learning inference, on edge  ...  The benefits of model compression also depend on the network latency-the higher the latency to the cloud, the more valuable is the ability to handle inference locally and avoid an expensive network hop  ... 
arXiv:2003.12488v1 fatcat:rice6s77jjevlk3ir4em6doc2e


Keon Jang, Justine Sherry, Hitesh Ballani, Toby Moncaster
2015 Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication - SIGCOMM '15  
Many cloud applications can benefit from guaranteed latency for their network messages, however providing such predictability is hard, especially in multi-tenant datacenters.  ...  Silo does not require any changes to applications, guest OSes or network switches. We show that Silo can ensure predictable message latency for cloud applications while imposing low overhead.  ...  We thank Richard Black and Austin Donnelly for the suggestion regarding the packet pacing method. We thank the anonymous reviewers and our shepherd Teemu Koponen for their helpful comments.  ... 
doi:10.1145/2785956.2787479 dblp:conf/sigcomm/JangSBM15 fatcat:hxxcjr6rfbcpbi3pj2xceihekq

Dynamic Space-Time Scheduling for GPU Inference [article]

Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, Ion Stoica
2018 arXiv   pre-print
Serving deep neural networks in latency critical interactive settings often requires GPU acceleration.  ...  We evaluate the performance trade-offs of each approach with respect to resource-efficiency, latency predictability, and isolation when compared with conventional batched inference.  ...  pass of the network.  ... 
arXiv:1901.00041v1 fatcat:hzxlziaftvbypnwtls5a4uqk6e

Cloud Datacenter SDN Monitoring

Arjun Roy, Deepak Bansal, David Brumley, Harish Kumar Chandrappa, Parag Sharma, Rishabh Tewari, Behnaz Arzani, Alex C. Snoeren
2018 Proceedings of the Internet Measurement Conference 2018 on - IMC '18  
We present a first look into the nuances of monitoring these "virtualized" networks through the lens of a large cloud provider.  ...  We show that interactions between the virtualization, tenant software, and lower layers of the network fabric both simplify and complicate different aspects of fault detection and diagnosis efforts.  ...  smaller cloud tenants.  ... 
doi:10.1145/3278532.3278572 fatcat:axddmo7dcnhypjttb24roqgbye

ClouDiA: a deployment advisor for public clouds

Tao Zou, Ronan Le Bras, Marcos Vaz Salles, Alan Demers, Johannes Gehrke
2015 The VLDB journal  
the cloud.  ...  An increasing number of distributed data-driven applications are moving into shared public clouds.  ...  Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.  ... 
doi:10.1007/s00778-014-0375-9 fatcat:xz4dc6k2pvajpi4mn3qwwdpyti

Inferring Cloud-Network Slice's Requirements from Non-Structured Service Description

Rafael Pasquini, Javier Baliosian, Joan Serrat, Juan-Luis Gorricho, Augusto Neto, Fabio Verdi
2020 NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium  
The ultimate goal is to allow a NECOS tenant to place a slice request describing key aspects of its service and, at the end of the process, to receive its slice up and running.  ...  based upon Structured Output Learning [1], a machine learning umbrella to infer dependencies in between arbitrary inputs and outputs.  ...  ACKNOWLEDGEMENTS This research was supported by the H2020 4th EU-BR Collaborative Call (Grant Agreement no. 777067 -Novel Enablers for Cloud Slicing).  ... 
doi:10.1109/noms47738.2020.9110413 dblp:conf/noms/PasquiniB0G0V20 fatcat:i36yj6izlngv5cdxx346pi2j64


Tao Zou, Ronan Le Bras, Marcos Vaz Salles, Alan Demers, Johannes Gehrke
2012 Proceedings of the VLDB Endowment  
the cloud.  ...  An increasing number of distributed data-driven applications are moving into shared public clouds.  ...  Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.  ... 
doi:10.14778/2535568.2448945 fatcat:yylwe55mdvhljlzt5farplmjoe

SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [article]

Chulhong Min, Akhil Mathur, Utku Gunay Acer, Alessandro Montanari, Fahim Kawsar
2021 arXiv   pre-print
An adaptive scheduler then orchestrates the best-effort executions of multiple models across heterogeneous accelerators, balancing latency and throughput.  ...  We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort  ...  More specifically, we count the number of inferences per second, for which the end-to-end latency is below than the latency requirement of the model.  ... 
arXiv:2109.03947v1 fatcat:4rkw6km2fvh5lgn4qm23cvf6za

Personalized Pseudonyms for Servers in the Cloud

Qiuyu Xiao, Michael K. Reiter, Yinqian Zhang
2017 Proceedings on Privacy Enhancing Technologies  
operator to prevent a network adversary from determining which of the cloud's tenant servers a client is accessing.  ...  In this paper we opportunistically leverage this trend to improve privacy of clients from network attackers residing between the clients and the cloud: We design a system that can be deployed by the cloud  ...  of the tenant server in the cloud.  ... 
doi:10.1515/popets-2017-0049 dblp:journals/popets/XiaoRZ17 fatcat:bmfepzltpnhthb6vvotuwfaxoy

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds [article]

Qianlin Liang, Walid A. Hanafy, Ahmed Ali-Eldin, Prashant Shenoy
2022 arXiv   pre-print
network (DNN) inference run by these applications.  ...  In this paper, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors  ...  Edge Computing and AI Inference for IoT Edge clouds are a form of edge computing that involves deploying computing and storage resources at the edge of the network to provide low latency access to users  ... 
arXiv:2201.07312v1 fatcat:d4wdw7frbvcfvjwgfwvcod5ufa


Mojgan Ghasemi, Theophilus Benson, Jennifer Rexford
2017 Proceedings of the Symposium on SDN Research - SOSR '17  
Offline processing of logs is slow and inefficient, and instrumenting the end-host network stack would violate the tenants' rights to manage their own virtual machines (VMs).  ...  With more applications moving to the cloud, cloud providers need to diagnose performance problems in a timely manner.  ...  Yet, cloud providers must monitor performance within their own infrastructure, since they cannot modify the end-host network stack without violating tenant control over their own VMs.  ... 
doi:10.1145/3050220.3050228 dblp:conf/sosr/GhasemiBR17 fatcat:gm6xbhzoc5c27lf2iivlbokjve

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments [article]

Abdul Dakkak, Cheng Li, Simon Garcia de Gonzalo, Jinjun Xiong, Wen-mei Hwu
2018 arXiv   pre-print
Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads  ...  Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct  ...  The benefits of TrIMS to the cloud providers can be passed down to the users in the form of reducing latency or cost of inference.  ... 
arXiv:1811.09732v1 fatcat:ezx2b6snqbch3nf7mxxpw3qfgi

PTPmesh: Data Center Network Latency Measurements Using PTP

Diana Andreea Popescu, Andrew W. Moore
2017 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)  
We show how to use the Precision Time Protocol (PTP) to infer network latency and packet loss in data centers, and we conduct network latency and packet loss measurements in data centers from different  ...  cloud providers, using PTPd, an open-source software implementation of PTP.  ...  Since our goal is that cloud tenants use PTP as a tool to measure network conditions, we do not use PTP-enabled NICs in the next experiments, since these may not be available in the cloud data centers.  ... 
doi:10.1109/mascots.2017.30 dblp:conf/mascots/Popescu017 fatcat:2nk2wjnqcjbjzcwyzns2zmbfp4
« Previous Showing results 1 — 15 out of 1,001 results