A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
[article]
2021
arXiv
pre-print
To maximize the resource efficiency of inference servers, a key mechanism proposed in this paper is to exploit hardware support for spatial partitioning of GPU resources. ...
To address the two requirements of ML inference servers, this paper proposes a new ML inference scheduling framework for multi-model ML inference servers. ...
GSLICE [16] is a GPU-based inference serving platform which boosts performance by spatially sharing GPUs and hiding reorganization cost. ...
arXiv:2109.01611v1
fatcat:twpsj5ke4nazjgi4uio3cemini
面向实时视频流分析的边缘计算技术
2021
Scientia Sinica Informationis
DeepRT GPU , ECML [153] GPU . ,
GPU . ECML GSLICE [173] . GPU , . , ECML GPU (controlled spatial multiplexing) GPU . GPU , , GPU . , , ECML 2 . , . , . Mainstream [154] , . , 87 % F1 . ...
&
"DisBatcher" mechanism
• Reduce deadline miss rate
• Increase GPU utilization
• Only works on single GPU
[153]
N/A
Controlled spatial-multiplexing;
Self-learning adaptive batching
• ...
doi:10.1360/ssi-2021-0133
fatcat:qs7jnvnknjhdrhfrru6rfbwuge