Work-in-Progress: Quantized NNs as the Definitive Solution for Inference on Low-Power ARM MCUs?

Manuele Rusci, Alessandro Capotondi, Francesco Conti, Luca Benini
2018 2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)  
High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with Q-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for Q = 4 and Q = 2 low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit
more » ... CMSIS-NN due to the lack of quantization support in the ISA; ii) for Q = 1 native instructions can be used, yielding an energy and latency reduction of ∼3.8× with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5× for Q = 4, 13.6× for Q = 2 and 6.5× for binary NNs.
doi:10.1109/codesisss.2018.8525915 dblp:conf/codes/RusciC0B18 fatcat:yicgyf5vnbesvibkr33jdcmrcy