Integrated ARM big.Little-Mali Pipeline for High-Throughput CNN Inference [post]

Ehsan Aghapour, A. Pathania, Gayathri Ananthanarayanan
2021 unpublished
<div>State-of-the-art Heterogeneous System on Chips (HMPSoCs) can perform on-chip embedded inference on its CPU and GPU. Multi-component pipelining is the method of choice to provide high-throughput Convolutions Neural Network (CNN) inference on embedded platforms. In this work, we provide details for the first CPU-GPU pipeline design for CNN inference called Pipe-All. Pipe-All uses the ARM-CL library to integrate an ARM big.Little CPU with an ARM Mali GPU. Pipe-All is the first three-stage CNN
more » ... inference pipeline design with ARM's big CPU cluster, Little CPU cluster, and Mali GPU as its stages. Pipe-All provides on average 75.88% improvement in inference throughput (over peak single-component inference) on Amlogic A311D HMPSoC in Khadas Vim 3 embedded platform. We also provide an open-source implementation for Pipe-All.</div><div>This paper is submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) as a transaction brief paper (5 pages).</div>
doi:10.36227/techrxiv.14994885 fatcat:pyguawiux5extirswtk6vsjj7m