A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training
[article]
2022
arXiv
pre-print
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original ...
Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream ...
To this end, we propose to search transferable BERT subnetworks via Task-Agnostic Mask Training (TAMT), which learns selective binary masks over the model weights on pre-training tasks. ...
arXiv:2204.11218v2
fatcat:42zvh7c7qnecrfjv52ergftuum
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training
2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
unpublished
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original ...
Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream ...
To this end, we propose to search transferable BERT subnetworks via Task-Agnostic Mask Training (TAMT), which learns selective binary masks over the model weights on pre-training tasks. ...
doi:10.18653/v1/2022.naacl-main.428
fatcat:bdqhkxu7czgrfiuoqbjd23dcby
Playing Lottery Tickets with Vision and Language
[article]
2021
arXiv
pre-print
However, we can find "relaxed" winning tickets at 50 Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60 transfer universally ...
In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained VL models. ...
Not only task-specific winning tickets can be found when running IMP on each downstream task separately, a task-agnostic winning ticket is also discovered via IMP on joint pre-training. ...
arXiv:2104.11832v2
fatcat:cwme4bil2vhdfjcgc7tsauxhwe
SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
[article]
2022
arXiv
pre-print
In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. ...
In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via ...
Acknowledgement We would like to acknowledge the funding support from the NSF NeTS funding (Award number: 1801865) and NSF SCH funding (Award number: 1838873) for this project. ...
arXiv:2207.03677v2
fatcat:eozidlwdsbamdis4zrryu2vcri
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
[article]
2022
arXiv
pre-print
of parameters compared to standard Adam optimization when fine-tuning BERT models, leading to higher rates of compression with little to no loss in accuracy on the GLUE classification benchmark. ...
while performing task-specific pruning, which we hypothesize should lead to simpler parameterizations and thus more compressible models. ...
We experiment with both standard pruning and Lottery Ticket-style IMP in our setting exploring pre-trained BERT language models, similar to . ...
arXiv:2205.12694v1
fatcat:ri6hahgygjcgxpmukopfjrt7n4
An Overview of Neural Network Compression
[article]
2020
arXiv
pre-print
Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage ...
Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. ...
You et al. (2019) identify to as 'early-bird' tickets (i.e winning tickets early on in training) using a combination of early stopping, low-precision training and large learning rates. ...
arXiv:2006.03669v2
fatcat:u2p6gvwhobh53hfjxawzclw7fq
Structured pruning for deep learning language models
[article]
2022
We propose a better implementation of pruning that considers both the pre-trained and the fine-tuned model. ...
In this Diploma Thesis, we study the compression of Deep Neural Networks, and more precisely, we study the structured pruning in Natural Language Processing models. ...
s T : Sparsity Threshold 8: return s After extracting the mask s, we can perform a one-shot structured Lottery Ticket Hypothesis on BERT: use the mask s on the pre-trained BERT and fine-tune it to the ...
doi:10.26240/heal.ntua.22766
fatcat:7tk6yltvgvfghfvuvfvvtizc4a
Finding the Dominant Winning Ticket in Pre-Trained Language Models
2022
Findings of the Association for Computational Linguistics: ACL 2022
unpublished
In this paper, we study whether there is a winning lottery ticket for pre-trained language models, which allow the practitioners to fine-tune the parameters in the ticket but achieve good downstream performance ...
tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix. ...
Acknowledgments This work is supported in part by the National Hi-Tech RD Program of China (No.2020AAA0106600). ...
doi:10.18653/v1/2022.findings-acl.115
fatcat:vib7ttjq3rfijixibcn3vtdu2m
What's Hidden in a One-layer Randomly Weighted Transformer?
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
The lottery ticket hypoth- Playing lottery tickets with vision and language.
esis for pre-trained bert networks. arXiv preprint arXiv preprint arXiv:2104.11832. ...
In a general pruning framework,
Lottery Tickets Hypothesis. ...
doi:10.18653/v1/2021.emnlp-main.231
fatcat:qhzyr4d3rjcsjgezsjwhiue5vu
Efficient Training and Compression of Deep Neural Networks
2022
typical resources available to the majority of machine learning practitioners. ...
Therefore, in practice, dense matrix multiplications are carried out on a sparse network by multiplying the parameter tensors with a binary mask, leading to more parameters, not less. ...
better pruning masks i.e lottery tickets. ...
doi:10.17638/03157802
fatcat:kboe4vvizfcyhlbdpx6lw7cnzm
Representation Learning and Learning from Limited Labeled Data for Community Question Answering
2021
Accessing the vast knowledge in them more effectively is a fundamental goal of many tasks in natural language processing. ...
Besides purely monolingual approaches, we study how to transfer text representations across languages. ...
I would also like to thank Prof. Dr. Jonathan Berant and Prof. Dr. Goran Glavaš for investing their time in reviewing my thesis. ...
doi:10.26083/tuprints-00018508
fatcat:lrqvfpbsvbb2lfo4bqrwu3jaym
Dynamic Mathematics for Automated Machine Learning Techniques
[article]
2021
However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet. ...
"Because they are difficult to implement in practice." I'd like to use machine learning, but I can't invest much time. ...
The lottery ticket refers to the smallest subnetwork consisting all essential connections within a much larger dense network. Frankle and Carbin sought their candidate ticket via cyclic training. ...
doi:10.25911/zmy2-7160
fatcat:flnkwfv33rbupg2e5m4twnbaie
Insights from Deep Representations for Machine Learning Systems and Human Collaborations
2020
Finally, we study how these fully trained AI systems can be adapted to work effectively with human experts, resulting in better outcomes than either humans or AI alone. ...
In this thesis, we present research results that take steps to addressing these challenges. ...
A good overview of BERT and transfer learning in NLP is given in http://jalammar.github.io/illustrated-bert/. ...
doi:10.7298/xvk2-m314
fatcat:f3qjq56xyrdbpognytmc6oizsu
Transformer-based NMT : modeling, training and implementation
[article]
2021
Transformer-Based NMT: Modeling, Training and Implementation ...
I also thank our supporters, collaborators and anonymous reviewers for their efforts in helping us improve our work. ...
For example, the Lottery Ticket (LT) hypothesis (Frankle and Carbin, 2019; Frankle et al., 2019; Dettmers and Zettlemoyer, 2019) suggests that there is a sparse sub-network in a dense network that outperforms ...
doi:10.22028/d291-34998
fatcat:4d226kujkjeodaiwzxltmgluhi