Filters








2 Hits in 1.5 sec

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning [article]

Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, Jaehyuk Huh
2021 arXiv   pre-print
To maximize the resource efficiency of inference servers, a key mechanism proposed in this paper is to exploit hardware support for spatial partitioning of GPU resources.  ...  To address the two requirements of ML inference servers, this paper proposes a new ML inference scheduling framework for multi-model ML inference servers.  ...  GSLICE [16] is a GPU-based inference serving platform which boosts performance by spatially sharing GPUs and hiding reorganization cost.  ... 
arXiv:2109.01611v1 fatcat:twpsj5ke4nazjgi4uio3cemini

面向实时视频流分析的边缘计算技术

Zheng Yang, Xiaowu He, Jiaxing Wu, Xu Wang, Yi Zhao
2021 Scientia Sinica Informationis  
DeepRT GPU , ECML [153] GPU . , GPU . ECML GSLICE [173] . GPU , . , ECML GPU (controlled spatial multiplexing) GPU . GPU , , GPU . , , ECML 2 . , . , . Mainstream [154] , . , 87 % F1 .  ...  & "DisBatcher" mechanism • Reduce deadline miss rate • Increase GPU utilization • Only works on single GPU [153] N/A Controlled spatial-multiplexing; Self-learning adaptive batching •  ... 
doi:10.1360/ssi-2021-0133 fatcat:qs7jnvnknjhdrhfrru6rfbwuge