Sample-Wise Dynamic Precision Quantization for Neural Network Acceleration

Bowen Li, Dongliang Xiong, Kai Huang, Xiaowen Jiang, Hao Yao, Junjian Chen, Luc Claesen
2022 IEICE Electronics Express  
Quantization is a well-known method for deep neural networks (DNNs) compression and acceleration. In this work, we propose the Sample-Wise Dynamic Precision (SWDP) quantization scheme, which can switch the bit-width of weights and activations in the model according to the task difficulty of input samples at runtime. Using low-precision networks for easy input images brings advantages in terms of computational and energy efficiency. We also propose an adaptive hardware design for the efficient
more » ... plementation of our SWDP networks. The experimental results on various networks and datasets demonstrate that our SWDP achieves an average of 3.3× speedup and 3.0× energy saving over the bit-level dynamically composable architecture BitFusion.
doi:10.1587/elex.19.20220229 fatcat:j3prleu5wjhdfoy5ldew4ygmcq