Filters








46,519 Hits in 3.8 sec

Structured Pruning of Large Language Models [article]

Ziheng Wang, Jeremy Wohlwend, Tao Lei
2019 arXiv   pre-print
Large language models have recently achieved state of the art performance across a wide variety of natural language tasks.  ...  Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly, and raises an interesting question: do language models need to be large?  ...  This work contributes to reducing the growing overhead of large language models, and shines a light on the role of model capacity in language modeling.  ... 
arXiv:1910.04732v1 fatcat:o2daer4ftraalg4jfnvssv6tgq

Structured Pruning of Large Language Models

Ziheng Wang, Jeremy Wohlwend, Tao Lei
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
Large language models have recently achieved state of the art performance across a wide variety of natural language tasks.  ...  We also demonstrate that our method can be applied to pruning adaptive word embeddings in large language models, and to pruning the BERT model on several downstream fine-tuning classification benchmarks  ...  We would also like to thank Hugh Perkins, Sam Bowman, Nicholas Matthews, Josh Shapiro and the other members of the Language Technology and Research teams who helped review this work and contributed their  ... 
doi:10.18653/v1/2020.emnlp-main.496 fatcat:n4rj2e6carcy3kiuzm3rmv355m

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning [article]

Bingbing Li, Zhenglun Kong, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding
2020 arXiv   pre-print
In this work, we propose an efficient transformer-based large-scale language representation using hardware-friendly block structure pruning.  ...  Pre-trained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks.  ...  In this work, we propose an efficient Transformer-based large-scale language representations using block structured pruning.  ... 
arXiv:2009.08065v4 fatcat:gef7hlznirgszirmoqauozmt2u

Reducing Transformer Depth on Demand with Structured Dropout [article]

Angela Fan, Edouard Grave, Armand Joulin
2019 arXiv   pre-print
In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time.  ...  These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting.  ...  A.1.2 LANGUAGE MODELING Training: To handle the large vocabulary of Wikitext-103, we follow Dauphin et al. (2017) and Baevski & Auli (2018) in using adaptive softmax and adaptive input for computational  ... 
arXiv:1909.11556v1 fatcat:yhf6lreaz5alhdq3rhl2ga77su

Accelerating Natural Language Understanding in Task-Oriented Dialog [article]

Ojas Ahuja, Shrey Desai
2020 arXiv   pre-print
In this work, we show that a simple convolutional model compressed with structured pruning achieves largely comparable results to BERT on ATIS and Snips, with under 100K parameters.  ...  Task-oriented dialog models typically leverage complex neural architectures and large-scale, pre-trained Transformers to achieve state-of-the-art performance on popular natural language understanding benchmarks  ...  Distillation achieves similar results as structured pruning with 0-50% sparsity, but its performance largely drops off after 80%.  ... 
arXiv:2006.03701v1 fatcat:is2dx34gtndhjdrylgy5n233gm

Structured Pruning of a BERT-based Question Answering Model [article]

J.S. McCarley, Rishav Chakravarti, Avirup Sil
2021 arXiv   pre-print
The recent trend in industry-setting Natural Language Processing (NLP) research has been to operate large %scale pretrained language models like BERT under strict computational limits.  ...  In this paper, we investigate compressing BERT- and RoBERTa-based question answering systems by structured pruning of parameters from the underlying transformer model.  ...  Introduction While knowledge distillation from large pretrained language models (e.g.  ... 
arXiv:1910.06360v3 fatcat:bkjuy3q7xnfgha4yviwgbnor54

Page 225 of Computational Linguistics Vol. 33, Issue 2 [page]

2007 Computational Linguistics  
Feature Weight language model (large) 1.00 language model (bitext) 1.03 P(y | «) 0.155 P(x | y) 1.23 Ply | x) 1.61 P..  ...  , as well as rules that contain multiple lexical items instead of one, an m-gram model whose structure cuts across the structure of context-free derivations, and large amounts of training data for meaningful  ... 

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks [article]

Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma
2022 arXiv   pre-print
However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models.  ...  It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component.  ...  The structure of adapter module follows [22] , and the projection dimension is set to 256 for base and large model.  ... 
arXiv:2203.04583v1 fatcat:yl6h2naqazhxzntaoznoeznjcm

Reweighted Proximal Pruning for Large-Scale Language Representation [article]

Fu-Ming Guo, Sijia Liu, Finlay S. Mungall, Xue Lin, Yanzhi Wang
2019 arXiv   pre-print
In this paper, we propose Reweighted Proximal Pruning (RPP), a new pruning method specifically designed for a large-scale language representation model.  ...  Is it possible to compress these large-scale language representation models? How will the pruned language representation affect the downstream multi-task transfer learning objectives?  ...  This is necessary in the weight pruning of super-deep language representation models.  ... 
arXiv:1909.12486v2 fatcat:2vu6giuusrc35pq25pia2vbk4e

Block Pruning For Faster Transformers [article]

François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush
2021 arXiv   pre-print
Our approach extends structured methods by considering blocks of any size and integrates this structure into the movement pruning paradigm for fine-tuning.  ...  We find that this approach learns to prune out full components of the underlying model, such as attention heads.  ...  LG] 10 Sep 2021 There has been a growing interest in the compression of pre-trained language models. We consider three varieties of methods: distillation, pruning, and structured pruning.  ... 
arXiv:2109.04838v1 fatcat:44uzhne4lndfzeesdhqevdg2cm

On Compressing N-Gram Language Models

Teemu Hirsimaki
2007 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07  
INTRODUCTION The major part of memory consumption of large-vocabulary continuous speech recognition systems is usually due to the size of statistical language models.  ...  BASELINE STRUCTURE Back-off language model In the rest of the paper, we assume that the language model is represented in a common back-off format.  ... 
doi:10.1109/icassp.2007.367228 dblp:conf/icassp/Hirsimaki07 fatcat:tvlojckk3zda3fekg3vjh7r73e

An Approach to Pruning Metamodels like UML

Zhiyi Ma
2017 Proceedings of the 5th International Conference on Model-Driven Engineering and Software Development  
There are a large number of modeling languages based on metamodels, and many of the languages are large and complex. In many cases, only part of a metamodel is needed.  ...  By deeply analyzing the characteristics such as special relations between packages and step-by-step strictly defining mechanism of modeling concepts, this paper presents an approach to pruning metamodels  ...  ACKNOWLEDGEMENTS The work supported by the National Natural Science Foundation of China (No. 61672046).  ... 
doi:10.5220/0006144004090417 dblp:conf/modelsward/Ma17 fatcat:yuq42ayw4nd3hex5xi4lc3oaky

Structured Sparsification of Gated Recurrent Neural Networks

Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry Vetrov
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We test our approach on the text classification and language modeling tasks. Our method improves the neuron-wise compression of the model in most of the tasks.  ...  We also observe that the resulting structure of gate sparsity depends on the task and connect the learned structures to the specifics of the particular tasks.  ...  For the large model ( fig. 7) , the structure is slightly different than for the small model.  ... 
doi:10.1609/aaai.v34i04.5938 fatcat:qfixhxyojbextd77pvxsvhk6yy

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM [article]

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu
2021 arXiv   pre-print
Recently, hardware manufacturers have introduced dedicated hardware for NxM sparsity to provide the flexibility of unstructured pruning with the runtime efficiency of structured approaches.  ...  To address such an issue in a principled manner, we introduce a new learning framework, called NxMTransformer, to induce NxM semi-structured sparsity on pretrained language models for natural language  ...  Acknowledgments and Disclosure of Funding We thank the anonymous NeurIPS reviewers for their constructive comments.  ... 
arXiv:2110.15766v1 fatcat:4q72ovgalbcbnmmuuakrn4paze

Compression of Deep Learning Models for Text: A Survey [article]

Manish Gupta, Puneet Agrawal
2021 arXiv   pre-print
of such models to enable their deployment in real industry NLP projects.Given the critical need of building applications with efficient and small models, and the large amount of recently published work  ...  In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanksto deep learning models like Recurrent Neural Networks (RNNs), Gated  ...  While weight pruning theoretically leads to pruning to a large extent, practical implementation of sparse data structures is difficult. Pruning and regularization need to be done together carefully.  ... 
arXiv:2008.05221v4 fatcat:6frf2wzi7zganaqgkuvy4szgmq
« Previous Showing results 1 — 15 out of 46,519 results