A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Efficient Bitwidth Search for Practical Mixed Precision Neural Network
[article]
2020
arXiv
pre-print
Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different
arXiv:2003.07577v1
fatcat:7gaprm7ypvdmlpejaycnp7wwv4