Filters








14 Hits in 7.8 sec

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training [article]

Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie Zhou
2022 arXiv   pre-print
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original  ...  Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream  ...  To this end, we propose to search transferable BERT subnetworks via Task-Agnostic Mask Training (TAMT), which learns selective binary masks over the model weights on pre-training tasks.  ... 
arXiv:2204.11218v2 fatcat:42zvh7c7qnecrfjv52ergftuum

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie Zhou
2022 Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original  ...  Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream  ...  To this end, we propose to search transferable BERT subnetworks via Task-Agnostic Mask Training (TAMT), which learns selective binary masks over the model weights on pre-training tasks.  ... 
doi:10.18653/v1/2022.naacl-main.428 fatcat:bdqhkxu7czgrfiuoqbjd23dcby

Playing Lottery Tickets with Vision and Language [article]

Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu
2021 arXiv   pre-print
However, we can find "relaxed" winning tickets at 50 Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60 transfer universally  ...  In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained VL models.  ...  Not only task-specific winning tickets can be found when running IMP on each downstream task separately, a task-agnostic winning ticket is also discovered via IMP on joint pre-training.  ... 
arXiv:2104.11832v2 fatcat:cwme4bil2vhdfjcgc7tsauxhwe

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning [article]

Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Lin
2022 arXiv   pre-print
In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs.  ...  In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via  ...  Acknowledgement We would like to acknowledge the funding support from the NSF NeTS funding (Award number: 1801865) and NSF SCH funding (Award number: 1838873) for this project.  ... 
arXiv:2207.03677v2 fatcat:eozidlwdsbamdis4zrryu2vcri

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models [article]

Clara Na, Sanket Vaibhav Mehta, Emma Strubell
2022 arXiv   pre-print
of parameters compared to standard Adam optimization when fine-tuning BERT models, leading to higher rates of compression with little to no loss in accuracy on the GLUE classification benchmark.  ...  while performing task-specific pruning, which we hypothesize should lead to simpler parameterizations and thus more compressible models.  ...  We experiment with both standard pruning and Lottery Ticket-style IMP in our setting exploring pre-trained BERT language models, similar to .  ... 
arXiv:2205.12694v1 fatcat:ri6hahgygjcgxpmukopfjrt7n4

An Overview of Neural Network Compression [article]

James O' Neill
2020 arXiv   pre-print
Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage  ...  Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing.  ...  You et al. (2019) identify to as 'early-bird' tickets (i.e winning tickets early on in training) using a combination of early stopping, low-precision training and large learning rates.  ... 
arXiv:2006.03669v2 fatcat:u2p6gvwhobh53hfjxawzclw7fq

Structured pruning for deep learning language models [article]

Stefanos Stamatis Achlatis, National Technological University Of Athens
2022
We propose a better implementation of pruning that considers both the pre-trained and the fine-tuned model.  ...  In this Diploma Thesis, we study the compression of Deep Neural Networks, and more precisely, we study the structured pruning in Natural Language Processing models.  ...  s T : Sparsity Threshold 8: return s After extracting the mask s, we can perform a one-shot structured Lottery Ticket Hypothesis on BERT: use the mask s on the pre-trained BERT and fine-tune it to the  ... 
doi:10.26240/heal.ntua.22766 fatcat:7tk6yltvgvfghfvuvfvvtizc4a

Finding the Dominant Winning Ticket in Pre-Trained Language Models

Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan
2022 Findings of the Association for Computational Linguistics: ACL 2022   unpublished
In this paper, we study whether there is a winning lottery ticket for pre-trained language models, which allow the practitioners to fine-tune the parameters in the ticket but achieve good downstream performance  ...  tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.  ...  Acknowledgments This work is supported in part by the National Hi-Tech RD Program of China (No.2020AAA0106600).  ... 
doi:10.18653/v1/2022.findings-acl.115 fatcat:vib7ttjq3rfijixibcn3vtdu2m

What's Hidden in a One-layer Randomly Weighted Transformer?

Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
The lottery ticket hypoth- Playing lottery tickets with vision and language. esis for pre-trained bert networks. arXiv preprint arXiv preprint arXiv:2104.11832.  ...  In a general pruning framework, Lottery Tickets Hypothesis.  ... 
doi:10.18653/v1/2021.emnlp-main.231 fatcat:qhzyr4d3rjcsjgezsjwhiue5vu

Efficient Training and Compression of Deep Neural Networks

James O' Neill
2022
typical resources available to the majority of machine learning practitioners.  ...  Therefore, in practice, dense matrix multiplications are carried out on a sparse network by multiplying the parameter tensors with a binary mask, leading to more parameters, not less.  ...  better pruning masks i.e lottery tickets.  ... 
doi:10.17638/03157802 fatcat:kboe4vvizfcyhlbdpx6lw7cnzm

Representation Learning and Learning from Limited Labeled Data for Community Question Answering

Andreas Rücklé
2021
Accessing the vast knowledge in them more effectively is a fundamental goal of many tasks in natural language processing.  ...  Besides purely monolingual approaches, we study how to transfer text representations across languages.  ...  I would also like to thank Prof. Dr. Jonathan Berant and Prof. Dr. Goran Glavaš for investing their time in reviewing my thesis.  ... 
doi:10.26083/tuprints-00018508 fatcat:lrqvfpbsvbb2lfo4bqrwu3jaym

Dynamic Mathematics for Automated Machine Learning Techniques [article]

Nicholas Kuo, University, The Australian National
2021
However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet.  ...  "Because they are difficult to implement in practice." I'd like to use machine learning, but I can't invest much time.  ...  The lottery ticket refers to the smallest subnetwork consisting all essential connections within a much larger dense network. Frankle and Carbin sought their candidate ticket via cyclic training.  ... 
doi:10.25911/zmy2-7160 fatcat:flnkwfv33rbupg2e5m4twnbaie

Insights from Deep Representations for Machine Learning Systems and Human Collaborations

Maithreyi Raghu
2020
Finally, we study how these fully trained AI systems can be adapted to work effectively with human experts, resulting in better outcomes than either humans or AI alone.  ...  In this thesis, we present research results that take steps to addressing these challenges.  ...  A good overview of BERT and transfer learning in NLP is given in http://jalammar.github.io/illustrated-bert/.  ... 
doi:10.7298/xvk2-m314 fatcat:f3qjq56xyrdbpognytmc6oizsu

Transformer-based NMT : modeling, training and implementation [article]

Hongfei Xu, Universität Des Saarlandes
2021
Transformer-Based NMT: Modeling, Training and Implementation  ...  I also thank our supporters, collaborators and anonymous reviewers for their efforts in helping us improve our work.  ...  For example, the Lottery Ticket (LT) hypothesis (Frankle and Carbin, 2019; Frankle et al., 2019; Dettmers and Zettlemoyer, 2019) suggests that there is a sparse sub-network in a dense network that outperforms  ... 
doi:10.22028/d291-34998 fatcat:4d226kujkjeodaiwzxltmgluhi