A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
[article]
2022
arXiv
pre-print
We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. ...
We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the ...
TM acknowledges support of Google Faculty Award, NSF IIS 2045685, and JD.com. ...
arXiv:2106.09226v2
fatcat:lkf6ixbvrvcj3dg5rw3dqtmfau
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
[article]
2022
arXiv
pre-print
Lastly, we present a comprehensive analysis including the combination of adapter and task-specific prompts and the impact of V&L pre-training on adapters. ...
Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks. ...
The views, opinions, and/or findings contained in this article are those of the authors and not of the funding agency. ...
arXiv:2112.06825v2
fatcat:dil7vyflcjevvcfsjjsqtpnptm
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
[article]
2021
arXiv
pre-print
We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. ...
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. ...
We thank Sebastian Ruder, Max Glockner, Jason Phang, Alex Wang, Katrina Evtimova and Sam Bowman for insightful feedback and suggestions on drafts of this paper. ...
arXiv:2005.00247v3
fatcat:rhjexrlidzcjtck5xjqcmaxmxe
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
[article]
2022
arXiv
pre-print
Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. ...
Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. ...
Methods In this work, we explore many methods to utilize large-scale pretrained unimodal encoder models to help downstream VL tasks, including direct finetuning, adding adapters on unimodal models, and ...
arXiv:2204.10496v2
fatcat:t2lgj4cpxfg6nfdbnnayoud2mq
Exploring Universal Intrinsic Task Subspace via Prompt Tuning
[article]
2022
arXiv
pre-print
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? ...
To find such a subspace and examine its universality, we propose an analysis pipeline called intrinsic prompt tuning (IPT). ...
The mainstream way of downstream adaptation is fine-tuning, which adds task-specific classification heads and tunes all the PLM parameters with supervised data. ...
arXiv:2110.07867v2
fatcat:yinwtdo57nfovjvv4bnkqsmwl4
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
[article]
2022
arXiv
pre-print
Through this analysis, we show that manually curating an ideal set of tasks for multi-task pre-training is not straightforward, and that multi-task scaling can vastly improve models on its own. ...
Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during ...
We would also like to thank the authors of Mesh Tensorflow (Shazeer et al., 2018) and T5 , as their high-quality code and paper enabled this work. ...
arXiv:2111.10952v2
fatcat:ifwuj2gcufhbhpqg3au4e2ihne
CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems
[article]
2022
arXiv
pre-print
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, i.e. intent classification, dialog state tracking, and ...
Recently, prompting methods over pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. ...
However, the general objectives and tasks during the model pretraining phase are often very different from the formulation of specific downstream ToD tasks. ...
arXiv:2109.04645v4
fatcat:ak2l7tjrsrck5lnthbd7in6kjq
Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer
[article]
2022
arXiv
pre-print
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning. ...
Recent advances on prompt-tuning cast few-shot classification tasks as a masked language modeling problem. ...
As mentioned previously, prompt-tuning, a new paradigm that has emerged recently, can work well with little training data with the help of pretrained masked language head. ...
arXiv:2201.05411v1
fatcat:sl5i323pfjh3dpkjq536qqbo7i
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
[article]
2022
arXiv
pre-print
These tasks are collected with contributions of NLP practitioners in the community and through an iterative peer review process to ensure their quality. ...
To facilitate progress in this goal, we introduce Natural-Instructions v2, a benchmark of 1,600+ diverse language tasks and their expert-written instructions. ...
We also thank CSE 576 Topics in NLP class students at Arizona State University and all contributors who contributed to the repository. ...
arXiv:2204.07705v2
fatcat:ge3pnvgg6fczvcza7efvb6jkfi
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
[article]
2021
arXiv
pre-print
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. ...
These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. ...
ACKNOWLEDGMENTS Kalyan would like to thank his father Katikapalli Subramanyam for giving a) $750 to buy a new laptop, 24inch monitor and study table. b) $180 for one year subscription of Medium, Overleaf ...
arXiv:2108.05542v2
fatcat:4uyj6uut65d37hfi7yss2fek6q
WARP: Word-level Adversarial ReProgramming
[article]
2021
arXiv
pre-print
Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. ...
A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model. ...
On the sentiment analysis task, the performance is comparable to the fully fine-tuned language models. ...
arXiv:2101.00121v2
fatcat:zkzwn7w3qrau7louqmqy3fmlqi
Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and text-specific questions, unlikely to be found in pretraining data. ...
Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. ...
Acknowledgements This project is funded in part by an NSF CAREER award to Anna Rumshisky (IIS-1652742). ...
doi:10.1609/aaai.v34i05.6398
fatcat:fu2eqrg54zevzclhumczzwke5a
Declaration-based Prompt Tuning for Visual Question Answering
[article]
2022
arXiv
pre-print
model is first optimized via self-supervised task objectives, e.g., masked language modeling (MLM) and image-text matching (ITM), and then fine-tuned to adapt to downstream task (e.g., VQA) via a brand-new ...
of VQA model, boosting the effective adaptation of pre-trained VL models to the downstream task. ...
Inspired by the recent progress of vision-language pretrained models (VL-PTM) and prompt tuning paradigms in cross-modal domain [Yao et al., 2021; Tsimpoukelli et al., 2021; Radford et al., 2021] , in ...
arXiv:2205.02456v1
fatcat:47gbc67h6vcs5lxkgghopquaxu
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
[article]
2022
arXiv
pre-print
and masked language modeling), and evaluated with and without multitask prompted finetuning. ...
We find that pretrained non-causal decoder models can be adapted into performant generative causal decoder models, using autoregressive language modeling as a downstream task. ...
Specifically, this work was conducted by a task force within the architecture & scaling group, seeking to establish the optimal architecture and pretraining objective for the final 176B parameter model ...
arXiv:2204.05832v1
fatcat:srmtrteyozbarl2v5tqu5s7mwe
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
[article]
2021
arXiv
pre-print
We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings. ...
We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. ...
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ...
arXiv:2104.08771v2
fatcat:mtua5a3nbrdg5g5iabnehgw7ui
« Previous
Showing results 1 — 15 out of 242 results