16 Hits in 1.9 sec

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models [article]

Elad Ben Zaken, Shauli Ravfogel, Yoav Goldberg
2022 arXiv   pre-print
For larger data, the method is competitive with other sparse fine-tuning methods.  ...  We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.  ...  Introduction Large pre-trained transformer based language models, and in particular bidirectional masked language models from the BERT family (Devlin et al., 2018; , are responsible for significant gains  ... 
arXiv:2106.10199v4 fatcat:nynbsljxene2rghcapeisu75ly

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [article]

Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-yi Lee
2022 arXiv   pre-print
Transformer-based pre-trained models with millions of parameters require large storage.  ...  The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained  ...  Introduction While large pre-trained language models (PLMs) reached state-of-the-art results on natural language processing (NLP) tasks, PLMs require updating all parameters and storing the fully fine-tuned  ... 
arXiv:2205.00305v1 fatcat:xgb2jjg2onavrox7mocz4duayy

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models [article]

Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao (+8 others)
2022 arXiv   pre-print
In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible.  ...  In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs  ...  Thanks to all the pioneering researchers who developed the structures, objectives, and delta tuning methods for pre-trained models. Ning Ding is supported by Baidu Scholarship.  ... 
arXiv:2203.06904v2 fatcat:yk2v44f74zbe7hfw4lw2nq7eju

Composable Sparse Fine-Tuning for Cross-Lingual Transfer [article]

Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vulić
2021 arXiv   pre-print
Fine-tuning all parameters of a pre-trained model has become the mainstream approach for transfer learning.  ...  Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture.  ...  Pretrained model 3. Fine-tuned model 2a. Sparse language fine-tuning 2b.  ... 
arXiv:2110.07560v1 fatcat:ntymeknwancdnpolmgv7tygl34

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment [article]

Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
2022 arXiv   pre-print
Then we propose a parameter-efficient fine-tuning strategy to boost the few-shot performance on the vqa task.  ...  However, after being pre-trained by language supervision from a large amount of image-caption pairs, CLIP itself should also have acquired some few-shot abilities for vision-language tasks.  ...  To investigate whether our BiNor fine-tuning strategy works well, we compare BiNor with two fine-tuning methods: 1) Full-FT (Full fine-tuning), which updates all parameters in the model. 2) BitFit (Ben  ... 
arXiv:2203.07190v1 fatcat:whf2ljh2mjfa5l4wsbr5dpvktq

MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer [article]

Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
2021 arXiv   pre-print
Adaptation strategies, namely partial fine-tuning, adapters, BITFIT, LNFIT, originally proposed to accelerate fine-tuning for new end-tasks, help retain multilingual knowledge from pretraining, substantially  ...  We find that fine-tuning a multilingually pretrained model (XLM-ROBERTA, MT5) in a single source language leads to catastrophic forgetting of multilingual knowledge and, consequently, poor zero-shot transfer  ...  We are grateful to Cognitiv+ Ltd. 14 for providing the compute infrastructure (an NVIDIA DGX-1 server with 8x NVIDIA V100 cards) for running the overwhelming number of experiments.  ... 
arXiv:2109.00904v2 fatcat:wn2pb26cp5csrkdpbh6jusrdum

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models [article]

Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Marzieh Saeidi, Lambert Mathias, Veselin Stoyanov, Majid Yazdani
2022 arXiv   pre-print
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that  ...  In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points.  ...  Acknowledgements The authors would like to thank Sebastian Ruder and Marius Mosbach for their comments on drafts of this paper.  ... 
arXiv:2204.01172v2 fatcat:gncdw4qptrhwnhdlgo5yydb4ja

Learning for Expressive Task-Related Sentence Representations [article]

Xueying Bai, Jinghuan Shang, Yifan Sun, Niranjan Balasubramanian
2022 arXiv   pre-print
NLP models learn sentence representations for downstream tasks by tuning a model which is pre-trained by masked language modeling.  ...  Experimental results show that, despite tuning only 5% additional parameters over a frozen pre-trained model, our model can achieve classification results comparable to the SOTA while maintaining strong  ...  Parameter Efficient Tuning Parameter efficient tuning learns for target tasks by tuning limited trainable parameters constructed on a pretrained model.  ... 
arXiv:2205.12186v1 fatcat:4wimp3gjcjfe5dae4cy7chkhe4

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [article]

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel
2022 arXiv   pre-print
Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform  ...  We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications.  ...  Acknowledgments and Disclosure of Funding We thank Brian Lester and Noah Constant for helpful discussion on debugging prompt tuning and Rabeeh Karimi Mahabadi for help with Compacter and Intrinsic SAID  ... 
arXiv:2205.05638v1 fatcat:y3tw7s3oozg3bhtbe3jpbtdhii

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey [article]

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, Dan Roth
2021 arXiv   pre-print
Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field.  ...  We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.  ...  Acknowledgments We would like to thank Paul Cummer for his insightful comments on this work.  ... 
arXiv:2111.01243v1 fatcat:4xfjkkby2bfnhdrhmrdlliy76m

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders [article]

Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier
2021 arXiv   pre-print
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years.  ...  Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning.  ...  Acknowledgements We thank the reviewers and the AC for their considerate comments. We also thank the LTL members and Xun Wang for insightful feedback.  ... 
arXiv:2104.08027v2 fatcat:aoeddhiep5cjvemqioa3dlgvee

AdapterFusion: Non-Destructive Task Composition for Transfer Learning [article]

Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych
2021 arXiv   pre-print
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at  ...  Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing.  ...  We thank Sebastian Ruder, Max Glockner, Jason Phang, Alex Wang, Katrina Evtimova and Sam Bowman for insightful feedback and suggestions on drafts of this paper.  ... 
arXiv:2005.00247v3 fatcat:rhjexrlidzcjtck5xjqcmaxmxe

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings [article]

Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm
2022 arXiv   pre-print
Furthermore, we demonstrate that it can train (or tune) models sample-efficiently, and that it can be combined with recent training-efficient methods.  ...  Perhaps surprisingly, even training a general-domain language model this way outperforms baselines pretrained in-domain.  ...  To explore the effect of computing efficient fine-tuning we also train a BitFit model (Zaken et al., 2021) with λ=1 −4 ( §7.2).  ... 
arXiv:2202.06671v1 fatcat:uyyagcrslza7dhi6mjuvloa4v4

Improving language models fine-tuning with representation consistency targets [article]

Anastasia Razdaibiedina, Vivek Madan, Zohar Karnin, Ashish Khetan, Vishaal Kapoor
2022 arXiv   pre-print
Fine-tuning contextualized representations learned by pre-trained language models has become a standard practice in the NLP field.  ...  We show that our approach matches or exceeds the performance of the existing regularization-based fine-tuning methods across 13 language understanding tasks (GLUE benchmark and six additional datasets)  ...  Bitfit: Simple parameter-efficient Stoyanov. 2020. Supervised contrastive learning for fine-tuning for transformer-based masked language- pre-trained language model fine-tuning.  ... 
arXiv:2205.11603v1 fatcat:rniglb3q5res5pl6ep3px2c5aq

Intrinisic Gradient Compression for Federated Learning [article]

Luke Melas-Kyriazi, Franklyn Wang
2021 arXiv   pre-print
Finally, in large-scale federated learning experiments with models containing up to 100M parameters, we show that our algorithms perform extremely well compared to current state-of-the-art gradient compression  ...  Specifically, we present three algorithms in this family with different levels of upload and download bandwidth for use in various federated settings, along with theoretical guarantees on their performance  ...  Bitfit: Simple parameter-efficient fine- tuning for transformer-based masked language-models. arXiv preprint, 2021. [30] Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin  ... 
arXiv:2112.02656v1 fatcat:bmkxosl22rgnln5ikdbaayzofi
« Previous Showing results 1 — 15 out of 16 results