6 Hits in 8.4 sec

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models [article]

Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh
2022 arXiv   pre-print
We introduce Optimal BERT Surgeon (oBERT), an efficient and accurate weight pruning method based on approximate second-order information, which we show to yield state-of-the-art results in both stages  ...  Specifically, oBERT extends existing work on unstructured second-order pruning by allowing for pruning blocks of weights, and by being applicable at the BERT scale.  ...  . • We introduce a general second-order pruning method called Optimal BERT Surgeon (oBERT), which supports unstructured and semi-structured pruning, and is the first second-order method to be both highly-accurate  ... 
arXiv:2203.07259v2 fatcat:5qjip6bwfjhw7pbxuooox2c3ym

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
2022 ACM Transactions on Design Automation of Electronic Systems  
To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization.  ...  We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design.  ...  Besides, NLP models have large opportunities for activation pruning and quantization because of the redundancy of human languages.  ... 
doi:10.1145/3486618 fatcat:h6xwv2slo5eklift2fl24usine

Leveraging Sparse Linear Layers for Debuggable Deep Networks [article]

Eric Wong, Shibani Santurkar, Aleksander Mądry
2021 arXiv   pre-print
We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.  ...  The code for our toolkit can be found at  ...  Acknowledgements We thank Dimitris Tsipras for helpful discussions. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.  ... 
arXiv:2105.04857v1 fatcat:fqwhkbr7krhnzal5ys74f5euci

Applications and Techniques for Fast Machine Learning in Science

Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik (+35 others)
2022 Frontiers in Big Data  
The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for  ...  In this community review report, we discuss applications and techniques for fast machine learning (ML) in science—the concept of integrating powerful ML methods into the real-time experimental data processing  ...  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon . Burlington: Morgan Kaufmann. Hassibi, B., Stork, D. G., and Wolff, G. J. (1993).  ... 
doi:10.3389/fdata.2022.787421 pmid:35496379 pmcid:PMC9041419 fatcat:5w2exf7vvrfvnhln7nj5uppjga

Scaling Language Models: Methods, Analysis Insights from Training Gopher [article]

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan (+68 others)
2022 arXiv   pre-print
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.  ...  Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.  ...  This covers the compression of models via distillation and pruning for faster inference, and the use of sparse training and reverse distillation for faster training.  ... 
arXiv:2112.11446v2 fatcat:wtajhbesibbetikkpow2vwiwqq

Efficient Training and Compression of Deep Neural Networks

James O' Neill
The application of deep neural networks is widespread throughout the world and is responsible for many crucial applications such as self-driving cars, machine translation, spoken language recognition,  ...  Although this research area has been somewhat active for the past three decades, it has seen a notable and proportional resurgence recently due to the rate of model size increase in deep neural networks  ...  Pruning using Second Order Derivatives optimal brain damage As mentioned, deleting single weights is computationally inefficient and slow.  ... 
doi:10.17638/03157802 fatcat:kboe4vvizfcyhlbdpx6lw7cnzm