Filters








3,979 Hits in 5.3 sec

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters [article]

Niccoló Nicodemo and Gaurav Naithani and Konstantinos Drossos and Tuomas Virtanen and Roberto Saletti
2019 arXiv   pre-print
The application of the low-bit quantization allows a 50% reduction of the DNN memory footprint while the STOI performance drops only by 2.7%.  ...  Effective employment of deep neural networks (DNNs) in mobile devices and embedded systems is hampered by requirements for memory and computational power.  ...  Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters  ... 
arXiv:1911.00527v1 fatcat:vvrqusiyirft7j6qrfn677priy

Low-complexity Recurrent Neural Network-based Polar Decoder with Weight Quantization Mechanism [article]

Chieh-Fang Teng, Chen-Hsi Wu, Kuan-Shiuan Ho, An-Yeu Wu
2019 arXiv   pre-print
However, neural networks are memory-intensive and hinder the deployment of DL in communication systems.  ...  In this work, a low-complexity recurrent neural network (RNN) polar decoder with codebook-based weight quantization is proposed.  ...  the required number of parameters for neural network decoder.  ... 
arXiv:1810.12154v2 fatcat:mmdc2qyhf5co7bmiuleuhwgr6a

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

Mário P. Véstias
2019 Algorithms  
CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms.  ...  The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms.  ...  To reduce memory bandwidth and power requirements, the arithmetic units support 8-and 12-bits quantization.  ... 
doi:10.3390/a12080154 fatcat:jbdak7eisbcjtj6ba5hlpvnq5y

Weight-Quantized SqueezeNet for Resource-Constrained Robot Vacuums for Indoor Obstacle Classification

Qian Huang
2022 AI  
With the rapid development of artificial intelligence (AI) theory, particularly deep learning neural networks, robot vacuums equipped with AI power can automatically clean indoor floors by using intelligent  ...  As a result, these existing deep AI models require far more memory space than a typical robot vacuum can provide.  ...  Conflicts of Interest: The author declares no conflict of interest.  ... 
doi:10.3390/ai3010011 fatcat:6yv34bhlsngq5a6nl3crtsz6e4

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Qing Qin, Jie Ren, JiaLong Yu, Hai Wang, Ling Gao, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
2018 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)  
We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model  ...  The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resourceconstrained computing devices.  ...  The reduction in the storage size is consistent across neural networks as the size of a network is dominated by its weights.  ... 
doi:10.1109/bdcloud.2018.00110 dblp:conf/ispa/QinRYWG0FFW18 fatcat:q6zjgqhqcngplgl67lh7fnjwsm

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference [article]

Qing Qin, Jie Ren, Jialong Yu, Ling Gao, Hai Wang, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
2018 arXiv   pre-print
We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model  ...  The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices.  ...  The reduction in the storage size is consistent across neural networks as the size of a network is dominated by its weights.  ... 
arXiv:1810.08899v1 fatcat:ewhclsajprh7tcp7kylxkftf3m

WRPN: Training and Inference using Wide Reduced-Precision Networks [article]

Asit Mishra, Jeffrey J Cook, Eriko Nurvitadhi, Debbie Marr
2017 arXiv   pre-print
For computer vision applications, prior works have shown the efficacy of reducing the numeric precision of model parameters (network weights) in deep neural networks but also that reducing the precision  ...  We reduce the precision of activation maps (along with model parameters) using a novel quantization scheme and increase the number of filter maps in a layer, and find that this scheme compensates or surpasses  ...  Due to such efficiency benefits, there are many existing works that have proposed low-precision deep neural networks (DNNs), even down to 2-bit ternary mode [5] and 1-bit mode [4, 1] .  ... 
arXiv:1704.03079v1 fatcat:iv6g7b3yvzhfzovq3ibvc7dhwi

Quantization of Deep Neural Networks for Accurate Edge Computing [article]

Wentao Chen, Hailong Qiu, Jian Zhuang, Chutong Zhang, Yu Hu, Qing Lu, Tianchen Wang, Yiyu Shi, Meiping Huang, Xiaowe Xu
2021 arXiv   pre-print
with 3.5x-6.4x memory reduction.  ...  Deep neural networks (DNNs) have demonstrated their great potential in recent years, exceeding the per-formance of human experts in a wide range of applications.  ...  Quantized neural networks, binarized neural networks, and XNOR-net [29] reduced the weights to only 1 bit and the activations to 1-2 bits resulting in a large reduction on memory and computation cost  ... 
arXiv:2104.12046v2 fatcat:dltil2m2yrgnbp6vgfrf46l6va

RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks

Cheng LUO, Wei CAO, Lingli WANG, Philip H. W. LEONG
2019 IEICE transactions on information and systems  
With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks  ...  In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them  ...  From the perspective of computation and memory cost, extensive quantization methods [10] - [13] are proposed to quantize neural networks with low-precision weights and activations, which drastically  ... 
doi:10.1587/transinf.2018rcp0008 fatcat:gmlkmykd6bevnfiqwklwicfiwy

Quantization and Deployment of Deep Neural Networks on Microcontrollers

Pierre-Emmanuel Novac, Ghouthi Boukli Hacene, Alain Pegatoquet, Benoît Miramond, Vincent Gripon
2021 Sensors  
This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers.  ...  However, there is still room for optimization of deep neural networks onto embedded devices.  ...  In Figure 6 , we can observe that the accuracy obtained using 8-bit and 16-bit quantization is similar only for deep neural networks exhibiting a reduced number of parameters, in other words a low memory  ... 
doi:10.3390/s21092984 pmid:33922868 pmcid:PMC8122998 fatcat:3hzk3tkvxbgurcy3o6wxmnssgm

ECQ^x: Explainability-Driven Quantization for Low-Bit and Sparse DNNs [article]

Daniel Becking, Maximilian Dreyer, Wojciech Samek, Karsten Müller, Sebastian Lapuschkin
2022 arXiv   pre-print
The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations.  ...  Experimental results show that this novel Entropy-Constrained and XAI-adjusted Quantization (ECQ^x) method generates ultra low-precision (2-5 bit) and simultaneously sparse neural networks while maintaining  ...  For instance, a reduction from standard 32 bit precision to 8 bit or 4 bit directly leads to a memory reduction of almost 4× and 8×.  ... 
arXiv:2109.04236v2 fatcat:xgrrejz33zfnpjclq4olzhix7i

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression [article]

Shihui Yin, Gaurav Srivastava, Shreyas K. Venkataramanaiah, Chaitali Chakrabarti, Visar Berisha, Jae-sun Seo
2018 arXiv   pre-print
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which  ...  However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored.  ...  DNN model with 8X compression and 3-bit weight quantization (10X weight memory reduction) shows minimal accuracy degradation of 0.45% compared to high precision and uncompressed network.  ... 
arXiv:1804.07370v1 fatcat:hirsopx7czbexffugk6a3iixmm

A Targeted Acceleration and Compression Framework for Low bit Neural Networks [article]

Biao Qian, Yang Wang
2019 arXiv   pre-print
1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement  ...  For the fully connected layer s, the binarization operation is re placed by network pruning and low bit quantization.  ...  However, 1-bit deep neural networks tolerate the sharp reduction of prediction accuracy when both activations and weights are binarized.  ... 
arXiv:1907.05271v1 fatcat:akcy4tq2ozailmivuvso6lkfae

Iteratively Training Look-Up Tables for Network Quantization [article]

Fabien Cardinaux and Stefan Uhlich and Kazuki Yoshiyama and Javier Alonso García and Stephen Tiedemann and Thomas Kemp and Akira Nakamura
2018 arXiv   pre-print
Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements.  ...  In order to obtain fully multiplier-less networks, we also introduce a multiplier-less version of batch normalization.  ...  Conclusions and Future Perspectives We have presented look-up table quantization, a novel approach for the reduction of size and computations of deep neural networks.  ... 
arXiv:1811.05355v1 fatcat:usziojexc5c5vfufvcoaknk4ma

Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms

Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella
2020 IEEE Journal on Selected Topics in Signal Processing  
This challenge discourages IoT implementation, where an efficient use of resources is required.  ...  test our approach on an ARM Cortex M4, particularly focusing on issues related to 8-bits quantization.  ...  These results paved the way to the effective use of deep neural networks on a low-power microcontroller to enable SED.  ... 
doi:10.1109/jstsp.2020.2969775 fatcat:pjsjujou6zcj7i2jggmgsgcwla
« Previous Showing results 1 — 15 out of 3,979 results