A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
[article]
2022
arXiv
pre-print
For larger data, the method is competitive with other sparse fine-tuning methods. ...
We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. ...
Introduction Large pre-trained transformer based language models, and in particular bidirectional masked language models from the BERT family (Devlin et al., 2018; , are responsible for significant gains ...
arXiv:2106.10199v4
fatcat:nynbsljxene2rghcapeisu75ly
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
[article]
2022
arXiv
pre-print
Transformer-based pre-trained models with millions of parameters require large storage. ...
The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained ...
Introduction While large pre-trained language models (PLMs) reached state-of-the-art results on natural language processing (NLP) tasks, PLMs require updating all parameters and storing the fully fine-tuned ...
arXiv:2205.00305v1
fatcat:xgb2jjg2onavrox7mocz4duayy
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
[article]
2022
arXiv
pre-print
In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. ...
In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs ...
Thanks to all the pioneering researchers who developed the structures, objectives, and delta tuning methods for pre-trained models. Ning Ding is supported by Baidu Scholarship. ...
arXiv:2203.06904v2
fatcat:yk2v44f74zbe7hfw4lw2nq7eju
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
[article]
2021
arXiv
pre-print
Fine-tuning all parameters of a pre-trained model has become the mainstream approach for transfer learning. ...
Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. ...
Pretrained model 3. Fine-tuned model 2a. Sparse language fine-tuning 2b. ...
arXiv:2110.07560v1
fatcat:ntymeknwancdnpolmgv7tygl34
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
[article]
2022
arXiv
pre-print
Then we propose a parameter-efficient fine-tuning strategy to boost the few-shot performance on the vqa task. ...
However, after being pre-trained by language supervision from a large amount of image-caption pairs, CLIP itself should also have acquired some few-shot abilities for vision-language tasks. ...
To investigate whether our BiNor fine-tuning strategy works well, we compare BiNor with two fine-tuning methods: 1) Full-FT (Full fine-tuning), which updates all parameters in the model. 2) BitFit (Ben ...
arXiv:2203.07190v1
fatcat:whf2ljh2mjfa5l4wsbr5dpvktq
MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
[article]
2021
arXiv
pre-print
Adaptation strategies, namely partial fine-tuning, adapters, BITFIT, LNFIT, originally proposed to accelerate fine-tuning for new end-tasks, help retain multilingual knowledge from pretraining, substantially ...
We find that fine-tuning a multilingually pretrained model (XLM-ROBERTA, MT5) in a single source language leads to catastrophic forgetting of multilingual knowledge and, consequently, poor zero-shot transfer ...
We are grateful to Cognitiv+ Ltd. 14 for providing the compute infrastructure (an NVIDIA DGX-1 server with 8x NVIDIA V100 cards) for running the overwhelming number of experiments. ...
arXiv:2109.00904v2
fatcat:wn2pb26cp5csrkdpbh6jusrdum
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
[article]
2022
arXiv
pre-print
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that ...
In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. ...
Acknowledgements The authors would like to thank Sebastian Ruder and Marius Mosbach for their comments on drafts of this paper. ...
arXiv:2204.01172v2
fatcat:gncdw4qptrhwnhdlgo5yydb4ja
Learning for Expressive Task-Related Sentence Representations
[article]
2022
arXiv
pre-print
NLP models learn sentence representations for downstream tasks by tuning a model which is pre-trained by masked language modeling. ...
Experimental results show that, despite tuning only 5% additional parameters over a frozen pre-trained model, our model can achieve classification results comparable to the SOTA while maintaining strong ...
Parameter Efficient Tuning Parameter efficient tuning learns for target tasks by tuning limited trainable parameters constructed on a pretrained model. ...
arXiv:2205.12186v1
fatcat:4wimp3gjcjfe5dae4cy7chkhe4
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
[article]
2022
arXiv
pre-print
Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform ...
We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. ...
Acknowledgments and Disclosure of Funding We thank Brian Lester and Noah Constant for helpful discussion on debugging prompt tuning and Rabeeh Karimi Mahabadi for help with Compacter and Intrinsic SAID ...
arXiv:2205.05638v1
fatcat:y3tw7s3oozg3bhtbe3jpbtdhii
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
[article]
2021
arXiv
pre-print
Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. ...
We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. ...
Acknowledgments We would like to thank Paul Cummer for his insightful comments on this work. ...
arXiv:2111.01243v1
fatcat:4xfjkkby2bfnhdrhmrdlliy76m
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
[article]
2021
arXiv
pre-print
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. ...
Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. ...
Acknowledgements We thank the reviewers and the AC for their considerate comments. We also thank the LTL members and Xun Wang for insightful feedback. ...
arXiv:2104.08027v2
fatcat:aoeddhiep5cjvemqioa3dlgvee
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
[article]
2021
arXiv
pre-print
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at AdapterHub.ml. ...
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. ...
We thank Sebastian Ruder, Max Glockner, Jason Phang, Alex Wang, Katrina Evtimova and Sam Bowman for insightful feedback and suggestions on drafts of this paper. ...
arXiv:2005.00247v3
fatcat:rhjexrlidzcjtck5xjqcmaxmxe
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings
[article]
2022
arXiv
pre-print
Furthermore, we demonstrate that it can train (or tune) models sample-efficiently, and that it can be combined with recent training-efficient methods. ...
Perhaps surprisingly, even training a general-domain language model this way outperforms baselines pretrained in-domain. ...
To explore the effect of computing efficient fine-tuning we also train a BitFit model (Zaken et al., 2021) with λ=1 −4 ( §7.2). ...
arXiv:2202.06671v1
fatcat:uyyagcrslza7dhi6mjuvloa4v4
Improving language models fine-tuning with representation consistency targets
[article]
2022
arXiv
pre-print
Fine-tuning contextualized representations learned by pre-trained language models has become a standard practice in the NLP field. ...
We show that our approach matches or exceeds the performance of the existing regularization-based fine-tuning methods across 13 language understanding tasks (GLUE benchmark and six additional datasets) ...
Bitfit: Simple parameter-efficient Stoyanov. 2020. Supervised contrastive learning for fine-tuning for transformer-based masked language- pre-trained language model fine-tuning. ...
arXiv:2205.11603v1
fatcat:rniglb3q5res5pl6ep3px2c5aq
Intrinisic Gradient Compression for Federated Learning
[article]
2021
arXiv
pre-print
Finally, in large-scale federated learning experiments with models containing up to 100M parameters, we show that our algorithms perform extremely well compared to current state-of-the-art gradient compression ...
Specifically, we present three algorithms in this family with different levels of upload and download bandwidth for use in various federated settings, along with theoretical guarantees on their performance ...
Bitfit: Simple parameter-efficient fine-
tuning for transformer-based masked language-models. arXiv preprint, 2021.
[30] Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin ...
arXiv:2112.02656v1
fatcat:bmkxosl22rgnln5ikdbaayzofi
« Previous
Showing results 1 — 15 out of 16 results