3,742 Hits in 5.4 sec

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method [article]

Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang
2020 arXiv   pre-print
Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures.  ...  To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method  ...  Block-based Structured Pruning -A Unique Perspective on Structured Weight Pruning Conventional, structured pruning treats the DNN weight matrix in each layer as a whole, and selects to prune a whole row  ... 
arXiv:2001.08357v2 fatcat:7nl364nd4bd6nozcmmvwg4rmdu

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency [article]

Fuxun Yu, Chenchen Liu, Di Wang, Yanzhi Wang, Xiang Chen
2020 arXiv   pre-print
Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework including (1) testing-phase channel and column feature map pruning, as well as (2) training-phase  ...  Therefore, we propose a dynamic CNN optimization framework in this work.  ...  Conclusion In this work, we propose a dynamic feature map pruning method based on attention mechanism.  ... 
arXiv:2008.06543v1 fatcat:kvgqjpjlxbaivfhebjeanfy3hu

An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication

Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
of the input feature map coupled with a systolic-array-based general matrix-matrix multiplication (GEMM) unit.  ...  The systolic-array-based GEMM unit in the accelerator can be dynamically configured as multiple GEMM units with square-shaped systolic arrays or as a single GEMM unit with a tall systolic array.  ...  This article proposes SPOTS, a hardware accelerator for sparse CNNs with a matrix multiplication formulation of convolution using the Im2Col transformation.  ... 
doi:10.1145/3532863 fatcat:7iny5yokebaepi5fedmvh4hyku

Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning

Jianda Chen, Shangyu Chen, Sinno Jialin Pan
2020 Neural Information Processing Systems  
Comparison experimental results with existing runtime and static pruning methods on state-of-the-art CNNs demonstrate that our proposed framework is able to provide a tradeoff between dynamic flexibility  ...  In this paper, we propose a deep reinforcement learning (DRL) based framework to efficiently perform runtime channel pruning on convolutional neural networks (CNNs).  ...  Broader Impact This work is basic research on neural networks compression. We believe this is not applicable to our work.  ... 
dblp:conf/nips/ChenCP20 fatcat:sfevrol4tzb73l5mwritjxybym

A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration

Deepak Ghimire, Dayoung Kil, Seong-heum Kim
2022 Electronics  
The learning capability of convolutional neural networks (CNNs) originates from a combination of various feature extraction layers that fully utilize a large amount of data.  ...  Over the past decade, deep-learning-based representations have demonstrated remarkable performance in academia and industry.  ...  In surveying efficient CNN architectures and hardware acceleration, we are deeply grateful again for all the researchers and their contributions to our science.  ... 
doi:10.3390/electronics11060945 fatcat:bxxgccwkujatzh4onkzh5lgspm

Faster CNNs with Direct Sparse Convolutions and Guided Pruning [article]

Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, Pradeep Dubey
2017 arXiv   pre-print
We present a method to realize simultaneously size economy and speed improvement while pruning CNNs.  ...  The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained.  ...  We would also like to thank Nitish Shirish Keskar for his recommendations on hyper-parameter settings.  ... 
arXiv:1608.01409v5 fatcat:zf2jtci5b5bz5chgc2o42zu7vy

ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning [article]

Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao
2021 arXiv   pre-print
By leveraging pattern-based pruning with our proposed novel accurate weight importance estimation, dynamic pattern generation and selection, and compiler-assisted computation optimizations, ClickTrain  ...  The wide and deep CNNs, however, require a large amount of computing resources and processing time.  ...  The previous methods [28, 36] determine the importance of a certain weight based on its magnitude, which requires a well-trained CNN model whose weights will not change dramatically and well distributed  ... 
arXiv:2011.10170v2 fatcat:xxf7ywv3rnehxc5db2huvvycna

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication [article]

Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte
2021 arXiv   pre-print
coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit.  ...  This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map  ...  ACKNOWLEDGMENTS This material is based upon work supported in part by the National Science Foundation under Grant No. 1908798.  ... 
arXiv:2107.13386v2 fatcat:k7oampka5rdztojmmwrr2yvnfm

Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion [article]

Mingbao Lin, Rongrong Ji, Bohong Chen, Fei Chao, Jianzhuang Liu, Wei Zeng, Yonghong Tian, Qi Tian
2021 arXiv   pre-print
In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computation-economical and regularization-free manner for efficient image  ...  Each filter in our DCFF is firstly given an inter-similarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion  ...  A bunch of existing methods build filter pruning on top of a pretrained CNN model [7] , [8] , [24] - [30] .  ... 
arXiv:2107.06916v1 fatcat:hkrtjcz33rhvhim3zf26brr3mi

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [article]

Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, Marianne Winslett
2021 arXiv   pre-print
One potential remedy for this is model compression, which has attracted a lot of research attention.  ...  Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.  ...  Acknowledgement This publication was made possible by NPRP grant NPRP10-0208-170408 from the Qatar National Research Fund (a member of Qatar Foun-dation).  ... 
arXiv:2002.11985v2 fatcat:bqkgu2fwinaxretgtjimwwthmu

Structured Pruning for Efficient ConvNets via Incremental Regularization [article]

Huan Wang, Qiming Zhang, Yuehai Wang, Yu Lu, Haoji Hu
2019 arXiv   pre-print
To achieve this, we propose a new and novel regularization-based pruning method, named IncReg, to incrementally assign different regularization factors to different weights based on their relative importance  ...  expressiveness of CNNs, and thus calls for a more gentle regularization scheme so that the networks can adapt during pruning.  ...  Importance-based pruning methods prune weights in groups based on some established importance criteria.  ... 
arXiv:1804.09461v2 fatcat:qsea67a7n5cofmrjr43exs75cq

A Survey of Model Compression and Acceleration for Deep Neural Networks [article]

Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang
2020 arXiv   pre-print
Then we go through some very recent successful methods, for example, dynamic capacity networks and stochastic depths networks.  ...  Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.  ...  In [25] , a simple regularization method based on soft weight-sharing was proposed, which included both quantization and pruning in one simple (re-)training procedure.  ... 
arXiv:1710.09282v9 fatcat:frwedew2gfe3rjif5ds75jqay4

Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going [article]

Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides
2019 arXiv   pre-print
We also include proposals for future research based on a thorough analysis of current trends.  ...  Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic.  ...  Similar neurons can be wired together and hence pruned away. e authors proposed the similarity evaluation of neurons using a matrix of their squared Euclidean distances. is method resulted in 6.7× and  ... 
arXiv:1901.06955v3 fatcat:rkgo2oisdrgv3dtnbtlldlkpba

Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition [article]

Xuefeng Xiao, Lianwen Jin, Yafeng Yang, Weixin Yang, Jun Sun, Tianhai Chang
2017 arXiv   pre-print
Like other problems in computer vision, offline handwritten Chinese character recognition (HCCR) has achieved impressive results using convolutional neural network (CNN)-based methods.  ...  Furthermore, when integrated with our effective forward implementation, the recognition of an offline character image took only 9.7 ms on a CPU.  ...  [16] proposed dynamic network surgery that can dynamically prune and splice connections based on Han's work [15] .  ... 
arXiv:1702.07975v1 fatcat:paxuegfmg5azdd445xd7esstru

Coreset-Based Neural Network Compression [article]

Abhimanyu Dubey, Moitreya Chatterjee, Narendra Ahuja
2018 arXiv   pre-print
We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters.  ...  Our method requires no retraining, is easy to implement, and obtains state-of-the-art compression performance across a wide variety of CNN architectures.  ...  The previous state of the art method, Dynamic Net Surgery [15] , requires 140 epochs (in time units) whereas our method takes at most 28 epochs (in time units), a significant reduction of 80%.  ... 
arXiv:1807.09810v1 fatcat:r5deensjdjhfjcfdxjkrfjaopi
« Previous Showing results 1 — 15 out of 3,742 results