Filters








913 Hits in 6.3 sec

Neural Machine Translation with 4-Bit Precision and Beyond [article]

Alham Fikri Aji, Kenneth Heafield
2019 arXiv   pre-print
We also propose to use an error-feedback mechanism during retraining, to preserve the compressed model as a stale gradient.  ...  We empirically show that NMT models based on Transformer or RNN architecture can be compressed up to 4-bit precision without any noticeable quality degradation.  ...  ., 2014) to preserve the compressed model as a stale gradient, rather than discarding it every update during retraining.  ... 
arXiv:1909.06091v2 fatcat:m2dsb2dhvzagfpyacxo2otkpve

A Survey on Dynamic Neural Networks for Natural Language Processing [article]

Canwen Xu, Julian McAuley
2022 arXiv   pre-print
Dynamic neural networks could be a promising solution to the growing parameter numbers of pretrained language models, allowing both model pretraining with trillions of parameters and faster inference on  ...  Effectively scaling large Transformer models is a main driver of recent advances in natural language processing.  ...  If the gate decides to skip a time step, the hidden states will be directly copied without any update.  ... 
arXiv:2202.07101v1 fatcat:c62x43swubhwzlfsw44cax7e5q

Object Tracking Through Residual and Dense LSTMs [chapter]

Fabio Garcea, Alessandro Cucco, Lia Morra, Fabrizio Lamberti
2020 Lecture Notes in Computer Science  
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.  ...  By introducing skip connections, it is possible to increase the depth of the architecture while ensuring a fast convergence.  ...  , 15] , the original model has been retrained following the steps in the original publication on the ILSVRC2014 DET and ILSVRC2017 VID datasets, starting from the original code provided by the authors.  ... 
doi:10.1007/978-3-030-50516-5_9 fatcat:7b5prfwit5fo7linoebp5pd2j4

Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Nesma M. Rezk, Tomas Nordström, Zain Ul-Abdin
2022 Information  
We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths.  ...  In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/info13040176 fatcat:tvquqdn3cffedltvbm2h6xan3u

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity [article]

Chang Gao, Tobi Delbruck, Shih-Chii Liu
2022 arXiv   pre-print
The pruned networks running on Spartus hardware achieve weight sparsity of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets.  ...  To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method.  ...  Linares-Barranco from the University of Seville for creating the baseboard for our FPGAs.  ... 
arXiv:2108.02297v4 fatcat:7lyeb4z2cvbhtnczrnbwgudosa

Compressing Neural Machine Translation Models with 4-bit Precision

Alham Fikri Aji, Kenneth Heafield
2020 Proceedings of the Fourth Workshop on Neural Generation and Translation  
We empirically show that NMT models based on the Transformer or RNN architectures can be compressed up to 4-bit precision without any noticeable quality degradation.  ...  Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture appears more robust towards compression, compared to the Transformer.  ...  Acknowledgements This work was conducted within the scope of the Horizon 2020 Research and Innovation Action Bergamot, which has received funding from the European Union's Horizon 2020 research and innovation  ... 
doi:10.18653/v1/2020.ngt-1.4 dblp:conf/aclnmt/AjiH20 fatcat:72vsp4426zfa5fvoircn53dpfe

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling [article]

Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han
2018 arXiv   pre-print
By introducing the dense connectivity, we can detach any layer without affecting others, and stretch shallow and wide LMs to be deep and narrow.  ...  As different layers of the model keep different information, we develop a layer selection method for model pruning using sparsity-inducing regularization.  ...  the U.S.  ... 
arXiv:1804.07827v2 fatcat:nxaid4bfhjbotjn4vh72h65ugy

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Liyuan Liu, Xiang Ren, Jingbo Shang, Xiaotao Gu, Jian Peng, Jiawei Han
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
By introducing the dense connectivity, we can detach any layer without affecting others, and stretch shallow and wide LMs to be deep and narrow.  ...  As different layers of the model keep different information, we develop a layer selection method for model pruning using sparsityinducing regularization.  ...  the U.S.  ... 
doi:10.18653/v1/d18-1153 dblp:conf/emnlp/LiuRSG0018 fatcat:icjtjgcemrc5xggzw4yuv5s3je

AutoGAN: Neural Architecture Search for Generative Adversarial Networks [article]

Xinyu Gong, Shiyu Chang, Yifan Jiang, Zhangyang Wang
2019 arXiv   pre-print
Specifically, our discovered architectures achieve highly competitive performance compared to current state-of-the-art hand-crafted GANs, e.g., setting new state-of-the-art FID scores of 12.42 on CIFAR  ...  We define the search space for the generator architectural variations and use an RNN controller to guide the search, with parameter sharing and dynamic-resetting to accelerate the process.  ...  Here we use a (s+5) element tuple (skip 1 , ..., skip s , C, N, U, SC) to categorize the s-th cell, Figure 1 : The running scheme of the RNN controller.  ... 
arXiv:1908.03835v1 fatcat:y4yo5wpdcndyfhfdrkxvjpkzey

Faster Discovery of Neural Architectures by Searching for Paths in a Large Model

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean
2018 International Conference on Learning Representations  
On the Penn Treebank dataset, ENAS can discover a novel architecture thats achieves a test perplexity of 57.8, which is state-of-the-art among automatic model design methods on Penn Treebank.  ...  Meanwhile the model corresponding to the selected path is trained to minimize the cross entropy loss.  ...  When retraining the architecture recommended by the controller, however, we use variational dropout (Gal & Ghahramani, 2016) , an 2 regularization with weight decay of 10 −7 , and a state slowness regularization  ... 
dblp:conf/iclr/PhamGZLD18 fatcat:egik4rgcgzbp3i57hjdsmnygkq

E2-Train: Training State-of-the-art CNNs with Over 80 [article]

Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang
2019 arXiv   pre-print
model level; and sign prediction for low-cost, low-precision back-propagation, on the algorithm level.  ...  We strive to reduce the energy cost during training, by dropping unnecessary computations from three complementary levels: stochastic mini-batch dropping on the data level; selective layer update on the  ...  Acknowledgments The work is in part supported by the NSF RTML grant (1937592, 1937588).  ... 
arXiv:1910.13349v4 fatcat:pcoe2g42xfhfrh57swarlrlc7u

MOHAQ: Multi-Objective Hardware-Aware Quantization of Recurrent Neural Networks [article]

Nesma M. Rezk, Tomas Nordström, Dimitrios Stathis, Zain Ul-Abdin, Eren Erdal Aksoy, Ahmed Hemani
2022 arXiv   pre-print
Second, we propose the "beacon-based search" to retrain selected solutions only and use them as beacons to know the effect of retraining on other solutions.  ...  The compression of deep learning models is of fundamental importance in deploying such models to edge devices.  ...  The authors would also like to acknowledge the contribution of Tiago Fernandes Cortinhal in setting up the Python libraries and Yu Yang in the thoughtful discussions about SiLago architecture.  ... 
arXiv:2108.01192v3 fatcat:uhzdmfdqenbfbot5dmeao2dmmy

Recurrent Neural Networks: An Embedded Computing Perspective [article]

Nesma M. Rezk, Madhura Purnaprajna, Tomas Nordström, Zain Ul-Abdin
2019 arXiv   pre-print
Then, we explain the components of RNNs models from an implementation perspective. Furthermore, we discuss the optimizations applied on RNNs to run efficiently on embedded platforms.  ...  In this paper, we review the existing implementations of RNN models on embedded platforms and discuss the methods adopted to overcome the limitations of embedded systems.  ...  Activations were quantized after training and then the model was retrained for 40 epochs. The third approach is to use quantized parameters without training/retraining.  ... 
arXiv:1908.07062v2 fatcat:gf6k5fbztza6le2kwhuqpgdyku

E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang
2019 Neural Information Processing Systems  
model level; and sign prediction for low-cost, low-precision back-propagation, on the algorithm level.  ...  We strive to reduce the energy cost during training, by dropping unnecessary computations, from three complementary levels: stochastic mini-batch dropping on the data level; selective layer update on the  ...  Acknowledgments The work is in part supported by the NSF RTML grant (1937592, 1937588).  ... 
dblp:conf/nips/WangJC0ZLW19 fatcat:m7au7ujky5cidny7jxbjgrfide

Retraining-Based Iterative Weight Quantization for Deep Neural Networks [article]

Dongsoo Lee, Byeongwook Kim
2018 arXiv   pre-print
In the proposed technique, weight quantization is followed by retraining the model with full precision weights.  ...  In this work, we introduce an iterative technique to apply quantization, presenting high compression ratio without any modifications to the training algorithm.  ...  For Viterbi decompressor [14] , we chose NUM v =50, NUM c =5, and R=10 without skip state for 80% pruning rate on the PTB small and medium models.  ... 
arXiv:1805.11233v1 fatcat:pg4q3jinjbhupjotsdcv6wxq4q
« Previous Showing results 1 — 15 out of 913 results