A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks
[article]
2017
arXiv
pre-print
Networks and Recurrent Neural Networks. ...
Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory ...
Experiments on Recurrent Neural Networks In this subsection we evaluate the effect of Balanced Quantization on a few Recurrent Neural Networks. ...
arXiv:1706.07145v1
fatcat:ir2u3nmqjjfx7hnovfdcqtj2ai
Quantization Networks
[article]
2019
arXiv
pre-print
The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way. ...
Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. ...
Acknowledgements This work was supported in part by the National Key R&D Program of China under contract No. 2017YFB1002203 and NSFC No. 61872329. ...
arXiv:1911.09464v2
fatcat:ghjswjh6vnbz3ljeev7qb7u4em
Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer
2021
IEEE Access
As a result, recent quantization methods do not provide binarization, thus losing the most resource-efficient option, and quantized and binarized networks have been distinct research areas. ...
Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. ...
Figure 2 illustrates UniQ in contrast to the conventional approach. This paper first considers two popular weight quantizers in quantized networks. ...
doi:10.1109/access.2021.3067889
fatcat:okciggt7lvb2pfdxgx3gg2w2zq
SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training
[article]
2019
arXiv
pre-print
Deep quantization of neural networks (below eight bits) offers significant promise in reducing their compute and storage cost. ...
We carry out experimentation using the AlexNet, CIFAR-10, ResNet-18, ResNet-20, SVHN, and VGG-11 DNNs with three to five bits for quantization and show the versatility of SinReQ in enhancing multiple quantized ...
Semiconductor Research Corporation contract #2019-SD-2884, NSF awards CNS#1703812, ECCS#1609823, Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) award #FA9550-17-1-0274, and ...
arXiv:1905.01416v3
fatcat:3i32427r2vayvo32ufmlkh5vrq
Quantized Convolutional Neural Networks for Mobile Devices
[article]
2016
arXiv
pre-print
In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. ...
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. ...
Quantization with Error Correction So far, we have presented an intuitive approach to quantize parameters and improve the test-phase efficiency of convolutional networks. ...
arXiv:1512.06473v3
fatcat:7ioc6nsqqne73iuldfozqtqmbu
Deep Multiple Description Coding by Learning Scalar Quantization
[article]
2019
arXiv
pre-print
Thirdly, a pair of scalar quantizers accompanied by two importance-indicator maps is automatically learned in an end-to-end self-supervised way. ...
Secondly, two entropy estimation networks are learned to estimate the informative amounts of the quantized tensors, which can further supervise the learning of multiple description encoder network to represent ...
description scalar quantization, and context-based entropy estimation neural network. ...
arXiv:1811.01504v3
fatcat:27siiv6javantau7eto2dxwqv4
Joint Neural Architecture Search and Quantization
[article]
2018
arXiv
pre-print
Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices. ...
In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide ...
The gap between JASQNet and JASQNet (float) shows the effectiveness of our quantization policy. JASQNet reaches a better balance point thant other models. ...
arXiv:1811.09426v1
fatcat:jj5kspr46zbknjysobfvnnmnui
UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks
[article]
2018
arXiv
pre-print
Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks. ...
We present a novel method for neural network quantization that emulates a non-uniform k-quantile quantizer, which adapts to the distribution of the quantized parameters. ...
The scheme is amenable to efficient training by back propagation in full precision arithmetic, and achieves maximum efficiency with the k-quantile (balanced) quantizer that was investigated in this paper ...
arXiv:1804.10969v3
fatcat:hpzkvj2m6vhc5oj4kczl6hsfdy
FracBits: Mixed Precision Quantization via Fractional Bit-Widths
[article]
2020
arXiv
pre-print
Model quantization helps to reduce model size and latency of deep neural networks. ...
Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. ...
Network Pruning Network pruning is an orthogonal approach to speed up inference of neural networks to quantization. ...
arXiv:2007.02017v2
fatcat:smacowlqinaivg4s5t625epvzq
Fully Quantized Transformer for Machine Translation
[article]
2020
arXiv
pre-print
To this end, we propose FullyQT: an all-inclusive quantization strategy for the Transformer. ...
To the best of our knowledge, we are the first to show that it is possible to avoid any loss in translation quality with a fully quantized Transformer. ...
Wu et al. (2015) apply quantization to both kernels and fully connected layers of convolutional neural networks. ...
arXiv:1910.10485v3
fatcat:onqybviiavd6dk2usjjy6stcya
A High-Performance Adaptive Quantization Approach for Edge CNN Applications
[article]
2021
arXiv
pre-print
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications. ...
In this paper, we hence introduce an adaptive high-performance quantization method to resolve the issue of biased activation by dynamically adjusting the scaling and shifting factors based on the task ...
Language Model We also evaluate the effectiveness of our approach to natural language processing (NLP) tasks by applying our method to recurrent neural networks (RNN) [50] [51] for language modeling. ...
arXiv:2107.08382v1
fatcat:7ekolduuzbcm7lc65ivnjn7phq
Loss Aware Post-training Quantization
[article]
2020
arXiv
pre-print
Neural network quantization enables the deployment of large models on resource-constrained devices. ...
Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. ...
Acknowledgments The research was funded by National Cyber Security Authority and the Hiroshi Fujiwara Technion Cyber Security Research Center.
Bibliography ...
arXiv:1911.07190v2
fatcat:b64bbrxxqvbhzpeqqh4yx4xjyq
Towards Efficient Training for Neural Network Quantization
[article]
2019
arXiv
pre-print
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training. ...
Quantization reduces computation costs of neural networks but suffers from performance degeneration. ...
To make deep neural networks more efficient on model size, latency and energy, people have developed several approaches such as weight prunning [12] , model slimming [22, 48] , and quantization [5, ...
arXiv:1912.10207v1
fatcat:os6rhnm7cna2ld74xopfnxgr3e
Balanced Binary Neural Networks with Gated Residual
[article]
2020
arXiv
pre-print
In this paper, we attempt to maintain the information propagated in the forward process and propose a Balanced Binary Neural Networks with Gated Residual (BBG for short). ...
Binary neural networks have attracted numerous attention in recent years. ...
In recent years, a number of approaches have been proposed to learn portable deep neural networks, including multibit quantization [1] , pruning [2] , and lightweight architecture design [3] , knowledge ...
arXiv:1909.12117v2
fatcat:wh6xumkmunfixmugptaokdxg4i
Quantized Convolutional Neural Networks for Mobile Devices
2016
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. ...
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. ...
Quantization with Error Correction So far, we have presented an intuitive approach to quantize parameters and improve the test-phase efficiency of convolutional networks. ...
doi:10.1109/cvpr.2016.521
dblp:conf/cvpr/WuLWHC16
fatcat:yooibokbcvhxlhqdmjw7swegoq
« Previous
Showing results 1 — 15 out of 9,509 results