Filters








9,509 Hits in 4.1 sec

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks [article]

Shuchang Zhou, Yuzhi Wang, He Wen, Qinyao He, Yuheng Zou
2017 arXiv   pre-print
Networks and Recurrent Neural Networks.  ...  Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory  ...  Experiments on Recurrent Neural Networks In this subsection we evaluate the effect of Balanced Quantization on a few Recurrent Neural Networks.  ... 
arXiv:1706.07145v1 fatcat:ir2u3nmqjjfx7hnovfdcqtj2ai

Quantization Networks [article]

Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xiansheng Hua
2019 arXiv   pre-print
The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way.  ...  Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices.  ...  Acknowledgements This work was supported in part by the National Key R&D Program of China under contract No. 2017YFB1002203 and NSFC No. 61872329.  ... 
arXiv:1911.09464v2 fatcat:ghjswjh6vnbz3ljeev7qb7u4em

Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer

Phuoc Pham, Jacob A. Abraham, Jaeyong Chung
2021 IEEE Access  
As a result, recent quantization methods do not provide binarization, thus losing the most resource-efficient option, and quantized and binarized networks have been distinct research areas.  ...  Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services.  ...  Figure 2 illustrates UniQ in contrast to the conventional approach. This paper first considers two popular weight quantizers in quantized networks.  ... 
doi:10.1109/access.2021.3067889 fatcat:okciggt7lvb2pfdxgx3gg2w2zq

SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training [article]

Ahmed T. Elthakeb, Prannoy Pilligundla, Hadi Esmaeilzadeh
2019 arXiv   pre-print
Deep quantization of neural networks (below eight bits) offers significant promise in reducing their compute and storage cost.  ...  We carry out experimentation using the AlexNet, CIFAR-10, ResNet-18, ResNet-20, SVHN, and VGG-11 DNNs with three to five bits for quantization and show the versatility of SinReQ in enhancing multiple quantized  ...  Semiconductor Research Corporation contract #2019-SD-2884, NSF awards CNS#1703812, ECCS#1609823, Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) award #FA9550-17-1-0274, and  ... 
arXiv:1905.01416v3 fatcat:3i32427r2vayvo32ufmlkh5vrq

Quantized Convolutional Neural Networks for Mobile Devices [article]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
2016 arXiv   pre-print
In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models.  ...  Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks.  ...  Quantization with Error Correction So far, we have presented an intuitive approach to quantize parameters and improve the test-phase efficiency of convolutional networks.  ... 
arXiv:1512.06473v3 fatcat:7ioc6nsqqne73iuldfozqtqmbu

Deep Multiple Description Coding by Learning Scalar Quantization [article]

Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao
2019 arXiv   pre-print
Thirdly, a pair of scalar quantizers accompanied by two importance-indicator maps is automatically learned in an end-to-end self-supervised way.  ...  Secondly, two entropy estimation networks are learned to estimate the informative amounts of the quantized tensors, which can further supervise the learning of multiple description encoder network to represent  ...  description scalar quantization, and context-based entropy estimation neural network.  ... 
arXiv:1811.01504v3 fatcat:27siiv6javantau7eto2dxwqv4

Joint Neural Architecture Search and Quantization [article]

Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, Chunhong Pan
2018 arXiv   pre-print
Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices.  ...  In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide  ...  The gap between JASQNet and JASQNet (float) shows the effectiveness of our quantization policy. JASQNet reaches a better balance point thant other models.  ... 
arXiv:1811.09426v1 fatcat:jj5kspr46zbknjysobfvnnmnui

UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks [article]

Chaim Baskin, Eli Schwartz, Evgenii Zheltonozhskii, Natan Liss, Raja Giryes, Alex M. Bronstein, Avi Mendelson
2018 arXiv   pre-print
Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks.  ...  We present a novel method for neural network quantization that emulates a non-uniform k-quantile quantizer, which adapts to the distribution of the quantized parameters.  ...  The scheme is amenable to efficient training by back propagation in full precision arithmetic, and achieves maximum efficiency with the k-quantile (balanced) quantizer that was investigated in this paper  ... 
arXiv:1804.10969v3 fatcat:hpzkvj2m6vhc5oj4kczl6hsfdy

FracBits: Mixed Precision Quantization via Fractional Bit-Widths [article]

Linjie Yang, Qing Jin
2020 arXiv   pre-print
Model quantization helps to reduce model size and latency of deep neural networks.  ...  Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency.  ...  Network Pruning Network pruning is an orthogonal approach to speed up inference of neural networks to quantization.  ... 
arXiv:2007.02017v2 fatcat:smacowlqinaivg4s5t625epvzq

Fully Quantized Transformer for Machine Translation [article]

Gabriele Prato, Ella Charlaix, Mehdi Rezagholizadeh
2020 arXiv   pre-print
To this end, we propose FullyQT: an all-inclusive quantization strategy for the Transformer.  ...  To the best of our knowledge, we are the first to show that it is possible to avoid any loss in translation quality with a fully quantized Transformer.  ...  Wu et al. (2015) apply quantization to both kernels and fully connected layers of convolutional neural networks.  ... 
arXiv:1910.10485v3 fatcat:onqybviiavd6dk2usjjy6stcya

A High-Performance Adaptive Quantization Approach for Edge CNN Applications [article]

Hsu-Hsun Chin, Ren-Song Tsay, Hsin-I Wu
2021 arXiv   pre-print
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications.  ...  In this paper, we hence introduce an adaptive high-performance quantization method to resolve the issue of biased activation by dynamically adjusting the scaling and shifting factors based on the task  ...  Language Model We also evaluate the effectiveness of our approach to natural language processing (NLP) tasks by applying our method to recurrent neural networks (RNN) [50] [51] for language modeling.  ... 
arXiv:2107.08382v1 fatcat:7ekolduuzbcm7lc65ivnjn7phq

Loss Aware Post-training Quantization [article]

Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
2020 arXiv   pre-print
Neural network quantization enables the deployment of large models on resource-constrained devices.  ...  Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results.  ...  Acknowledgments The research was funded by National Cyber Security Authority and the Hiroshi Fujiwara Technion Cyber Security Research Center. Bibliography  ... 
arXiv:1911.07190v2 fatcat:b64bbrxxqvbhzpeqqh4yx4xjyq

Towards Efficient Training for Neural Network Quantization [article]

Qing Jin, Linjie Yang, Zhenyu Liao
2019 arXiv   pre-print
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.  ...  Quantization reduces computation costs of neural networks but suffers from performance degeneration.  ...  To make deep neural networks more efficient on model size, latency and energy, people have developed several approaches such as weight prunning [12] , model slimming [22, 48] , and quantization [5,  ... 
arXiv:1912.10207v1 fatcat:os6rhnm7cna2ld74xopfnxgr3e

Balanced Binary Neural Networks with Gated Residual [article]

Mingzhu Shen and Xianglong Liu and Ruihao Gong and Kai Han
2020 arXiv   pre-print
In this paper, we attempt to maintain the information propagated in the forward process and propose a Balanced Binary Neural Networks with Gated Residual (BBG for short).  ...  Binary neural networks have attracted numerous attention in recent years.  ...  In recent years, a number of approaches have been proposed to learn portable deep neural networks, including multibit quantization [1] , pruning [2] , and lightweight architecture design [3] , knowledge  ... 
arXiv:1909.12117v2 fatcat:wh6xumkmunfixmugptaokdxg4i

Quantized Convolutional Neural Networks for Mobile Devices

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models.  ...  Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks.  ...  Quantization with Error Correction So far, we have presented an intuitive approach to quantize parameters and improve the test-phase efficiency of convolutional networks.  ... 
doi:10.1109/cvpr.2016.521 dblp:conf/cvpr/WuLWHC16 fatcat:yooibokbcvhxlhqdmjw7swegoq
« Previous Showing results 1 — 15 out of 9,509 results