Filters








3,732 Hits in 4.3 sec

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks [article]

Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, Sherief Reda
2016 arXiv   pre-print
In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area.  ...  We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.  ...  ACKNOWLEDGMENT This work is supported by NSF grant 1420864 and by NVIDIA Corporation for their generous GPU donation. We also thank Professor Pedro Felzenszwalb for his helpful inputs.  ... 
arXiv:1612.03940v1 fatcat:c432ganlkjdpjjghhi3cddqncm

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Qing Qin, Jie Ren, JiaLong Yu, Hai Wang, Ling Gao, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
2018 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)  
We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model  ...  We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains.  ...  ; the UK EPSRC through grant agreements EP/M01567X/1 (SANDeRs) and EP/M015793/1 (DIV-IDEND); and the Royal Society International Collaboration Grant (IE161012).  ... 
doi:10.1109/bdcloud.2018.00110 dblp:conf/ispa/QinRYWG0FFW18 fatcat:q6zjgqhqcngplgl67lh7fnjwsm

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference [article]

Qing Qin, Jie Ren, Jialong Yu, Ling Gao, Hai Wang, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
2018 arXiv   pre-print
We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model  ...  We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains.  ...  ; the UK EPSRC through grant agreements EP/M01567X/1 (SANDeRs) and EP/M015793/1 (DIV-IDEND); and the Royal Society International Collaboration Grant (IE161012).  ... 
arXiv:1810.08899v1 fatcat:ewhclsajprh7tcp7kylxkftf3m

Dynamic Precision Analog Computing for Neural Networks [article]

Sahaj Garg, Joe Lou, Anirudh Jain, Mitchell Nahmias
2021 arXiv   pre-print
In one example, we apply dynamic precision to a shot-noise limited homodyne optical neural network and simulate inference at an optical energy consumption of 2.7 aJ/MAC for Resnet50 and 1.6 aJ/MAC for  ...  We propose extending analog computing architectures to support varying levels of precision by repeating operations and averaging the result, decreasing the impact of noise.  ...  ACKNOWLEDGMENT We would like to thank many members of the Luminous Computing team, including Michael Gao, Matthew Chang, Rodolfo Camacho-Aguilera, Rohun Saxena, Katherine Roelofs, and Patrick Gallagher  ... 
arXiv:2102.06365v1 fatcat:f7sur2nubzdpdoej77z4mm6cna

Low-bit Shift Network for End-to-End Spoken Language Understanding [article]

Anderson R. Avila, Khalil Bibi, Rui Heng Yang, Xinlin Li, Chao Xing, Xiao Chen
2022 arXiv   pre-print
In order to mitigate the high computation, memory, and power requirements of inferring convolutional neural networks (CNNs), we propose the use of power-of-two quantization, which quantizes continuous  ...  Experimental results show improved performance for shift neural network architectures, with our low-bit quantization achieving 98.76 \% on the test set which is comparable performance to its full-precision  ...  Quantization had a mild impact on the overall performance.  ... 
arXiv:2207.07497v1 fatcat:63vsuyqvufe4fc526que47sk3a

Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case [article]

Florian Rehm, Sofia Vallecorsa, Vikram Saletore, Hans Pabst, Adel Chaibi, Valeriu Codreanu, Kerstin Borras, Dirk Krücker
2021 arXiv   pre-print
A promising approach to make deep learning more efficient is to quantize the parameters of the neural networks to reduced precision.  ...  In this paper we analyse the effects of low precision inference on a complex deep generative adversarial network model.  ...  ACKNOWLEDGEMENTS This work has been sponsored by the Wolfgang Gentner Programme of the German Federal Ministry of Education and Research.  ... 
arXiv:2103.10142v1 fatcat:yl7ddmqdszfrphvoe25qzr4ipy

Reduced Precision Strategies for Deep Learning : A High Energy Physics Generative Adversarial Network Use Case

Florian Rehm, Sofia Vallecorsa, Vikram Saletore, Hans Pabst, Adel Chaibi, Valeriu Codreanu, Kerstin Borras, Dirk Krücker
2021 ICPRAM 2021 : proceedings of the 10th International Conference on Pattern Recognition Applications and Methods : online streaming  
A promising approach to make deep learning more efficient is to quantize the parameters of the neural networks to reduced precision.  ...  In this paper we analyse the effects of low precision inference on a complex deep generative adversarial network model.  ...  ACKNOWLEDGEMENTS This work has been sponsored by the Wolfgang Gentner Programme of the German Federal Ministry of Education and Research.  ... 
doi:10.18154/rwth-2021-06679 fatcat:kbibzmauerb5jkcgr765jxp3jy

Neural Network Quantization with Scale-Adjusted Training

Qing Jin, Linjie Yang, Zhenyu Liao, Xiaoning Qian
2020 British Machine Vision Conference  
With the proposed technique, quantized networks can demonstrate better performance than their full-precision counter-parts, and we achieve state-of-the-art accuracy with consistent improvement over previous  ...  However, previous works generally achieve network quantization by sacrificing on prediction accuracy with respect to their full-precision counterparts.  ...  In this paper, we study this problem by identifying the key factor impacting the prediction accuracy of quantized neural networks.  ... 
dblp:conf/bmvc/JinYLQ20 fatcat:jr7q3mlln5er3pty4y7kmgbbdi

Cheetah: Mixed Low-Precision Hardware Software Co-Design Framework for DNNs on the Edge [article]

Hamed F. Langroudi, Zachariah Carmichael, David Pastuch, Dhireesha Kudithipudi
2019 arXiv   pre-print
Additionally, the framework is amenable for different quantization approaches and supports mixed-precision floating point and fixed-point numerical formats.  ...  Low-precision DNNs have been extensively explored in order to reduce the size of DNN models for edge devices.  ...  Following this work, Hashemi et al. introduce low-precision DNN inference networks to better understand the impact of numerical formats on the energy consumption and performance of DNNs [15] , [16] .  ... 
arXiv:1908.02386v1 fatcat:szxgn75itvgplggolli56kop4u

CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks [article]

Muhammad Abdullah Hanif, Giuseppe Maria Sarda, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique
2022 arXiv   pre-print
We propose CoNLoCNN, a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting: (1) non-uniform quantization of weights enabling simplification of complex  ...  Fixed-Point (FP) implementations achieved through post-training quantization are commonly used to curtail the energy consumption of these networks.  ...  ACKNOWLEDGMENT This research is partly supported by the ASPIRE AARE Grant (S1561) on "Towards Extreme Energy Efficiency through Cross-Layer Approximate Computing".  ... 
arXiv:2208.00331v1 fatcat:navxy2jkxzeqhjwlyxdaewslz4

Ternary MobileNets via Per-Layer Hybrid Filter Banks [article]

Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
2019 arXiv   pre-print
size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.  ...  Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4-6 bits albeit with a modest to significant drop in accuracy  ...  as observed in Table 1 , reducing the throughput and energy-efficiency of neural network inference.  ... 
arXiv:1911.01028v1 fatcat:7awopatwxjgrtfypejzh3qqsre

Efficient Hybrid Network Architectures for Extremely Quantized Neural Networks Enabling Intelligence at the Edge [article]

Indranil Chakraborty, Deboleena Roy, Aayush Ankit, Kaushik Roy
2019 arXiv   pre-print
We explore several hybrid network architectures and analyze the performance of the networks in terms of accuracy, energy efficiency and memory compression.  ...  This has necessitated the search for efficient implementations of neural networks in terms of both computations and storage.  ...  ACKNOWLEDGEMENT This work was supported in part by the Center for Brain-inspired Computing Enabling Autonomous Intelligence (C-BRIC), one of six centers in JUMP, a Semiconductor Research Corporation (SRC  ... 
arXiv:1902.00460v1 fatcat:jwm7igbuxnfslkiu4xe3nzueyq

SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks [article]

Mi Sun Park, Xiaofan Xu, Cormac Brick
2019 arXiv   pre-print
Deep neural networks have achieved state-of-the-art accuracies in a wide range of computer vision, speech recognition, and machine translation tasks.  ...  Our method achieves state-of-the-art accuracies using 4-bit and 2-bit precision for ResNet18, MobileNet-v2 and ResNet50, even with high degree of sparsity.  ...  However, applying one after the other not only requires two-stage training, but also makes it difficult to quantize with lower precision after pruning, due to the lack of understanding the impact of pruning  ... 
arXiv:1812.08301v2 fatcat:4pbiirfk7rgwticqz4nm67ypmq

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference

Benjamin Hawks, Javier Duarte, Nicholas J. Fraser, Alessandro Pappalardo, Nhan Tran, Yaman Umuroglu
2021 Frontiers in Artificial Intelligence  
Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations.  ...  In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases.  ...  In addition, in this study we aim to understand how the network itself changes during training and optimization based on different NN configurations.  ... 
doi:10.3389/frai.2021.676564 fatcat:nbiy6vpzxnhkjjew3pe3it57li

AdaBits: Neural Network Quantization With Adaptive Bit-Widths

Qing Jin, Linjie Yang, Zhenyu Liao
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Deep neural networks with adaptive configurations have gained increasing attention due to the instant and flexible deployment of these models on platforms with different resource budgets.  ...  neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications. * Equal Contribution  ...  Acknowledgement The authors would like to appreciate invaluable discussion with Professor Hao Chen from University of California Davis and Professor Yi Ma from University of California Berkeley.  ... 
doi:10.1109/cvpr42600.2020.00222 dblp:conf/cvpr/JinYL20 fatcat:vbbyj2zktfevthtfop2fkmlpf4
« Previous Showing results 1 — 15 out of 3,732 results