A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Alternating Multi-bit Quantization for Recurrent Neural Networks
[article]
2018
arXiv
pre-print
Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes -1,+1. We formulate the quantization as
arXiv:1802.00150v1
fatcat:3qpnjetxn5hkvdk2uqpnpagagq