Filters








242 Hits in 6.9 sec

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning [article]

Colin Wei, Sang Michael Xie, Tengyu Ma
2022 arXiv   pre-print
We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting.  ...  We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the  ...  TM acknowledges support of Google Faculty Award, NSF IIS 2045685, and JD.com.  ... 
arXiv:2106.09226v2 fatcat:lkf6ixbvrvcj3dg5rw3dqtmfau

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [article]

Yi-Lin Sung, Jaemin Cho, Mohit Bansal
2022 arXiv   pre-print
Lastly, we present a comprehensive analysis including the combination of adapter and task-specific prompts and the impact of V&L pre-training on adapters.  ...  Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks.  ...  The views, opinions, and/or findings contained in this article are those of the authors and not of the funding agency.  ... 
arXiv:2112.06825v2 fatcat:dil7vyflcjevvcfsjjsqtpnptm

AdapterFusion: Non-Destructive Task Composition for Transfer Learning [article]

Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych
2021 arXiv   pre-print
We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model.  ...  Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing.  ...  We thank Sebastian Ruder, Max Glockner, Jason Phang, Alex Wang, Katrina Evtimova and Sam Bowman for insightful feedback and suggestions on drafts of this paper.  ... 
arXiv:2005.00247v3 fatcat:rhjexrlidzcjtck5xjqcmaxmxe

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks [article]

Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan
2022 arXiv   pre-print
Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets.  ...  Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research.  ...  Methods In this work, we explore many methods to utilize large-scale pretrained unimodal encoder models to help downstream VL tasks, including direct finetuning, adding adapters on unimodal models, and  ... 
arXiv:2204.10496v2 fatcat:t2lgj4cpxfg6nfdbnnayoud2mq

Exploring Universal Intrinsic Task Subspace via Prompt Tuning [article]

Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun (+1 others)
2022 arXiv   pre-print
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially?  ...  To find such a subspace and examine its universality, we propose an analysis pipeline called intrinsic prompt tuning (IPT).  ...  The mainstream way of downstream adaptation is fine-tuning, which adds task-specific classification heads and tunes all the PLM parameters with supervised data.  ... 
arXiv:2110.07867v2 fatcat:yinwtdo57nfovjvv4bnkqsmwl4

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning [article]

Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui (+2 others)
2022 arXiv   pre-print
Through this analysis, we show that manually curating an ideal set of tasks for multi-task pre-training is not straightforward, and that multi-task scaling can vastly improve models on its own.  ...  Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during  ...  We would also like to thank the authors of Mesh Tensorflow (Shazeer et al., 2018) and T5 , as their high-quality code and paper enabled this work.  ... 
arXiv:2111.10952v2 fatcat:ifwuj2gcufhbhpqg3au4e2ihne

CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems [article]

Fei Mi, Yitong Li, Yasheng Wang, Xin Jiang, Qun Liu
2022 arXiv   pre-print
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, i.e. intent classification, dialog state tracking, and  ...  Recently, prompting methods over pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD.  ...  However, the general objectives and tasks during the model pretraining phase are often very different from the formulation of specific downstream ToD tasks.  ... 
arXiv:2109.04645v4 fatcat:ak2l7tjrsrck5lnthbd7in6kjq

Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer [article]

Yinyi Wei, Tong Mo, Yongtao Jiang, Weiping Li, Wen Zhao
2022 arXiv   pre-print
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning.  ...  Recent advances on prompt-tuning cast few-shot classification tasks as a masked language modeling problem.  ...  As mentioned previously, prompt-tuning, a new paradigm that has emerged recently, can work well with little training data with the help of pretrained masked language head.  ... 
arXiv:2201.05411v1 fatcat:sl5i323pfjh3dpkjq536qqbo7i

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks [article]

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis (+28 others)
2022 arXiv   pre-print
These tasks are collected with contributions of NLP practitioners in the community and through an iterative peer review process to ensure their quality.  ...  To facilitate progress in this goal, we introduce Natural-Instructions v2, a benchmark of 1,600+ diverse language tasks and their expert-written instructions.  ...  We also thank CSE 576 Topics in NLP class students at Arizona State University and all contributors who contributed to the repository.  ... 
arXiv:2204.07705v2 fatcat:ge3pnvgg6fczvcza7efvb6jkfi

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing [article]

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
2021 arXiv   pre-print
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT.  ...  These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch.  ...  ACKNOWLEDGMENTS Kalyan would like to thank his father Katikapalli Subramanyam for giving a) $750 to buy a new laptop, 24inch monitor and study table. b) $180 for one year subscription of Medium, Overleaf  ... 
arXiv:2108.05542v2 fatcat:4uyj6uut65d37hfi7yss2fek6q

WARP: Word-level Adversarial ReProgramming [article]

Karen Hambardzumyan, Hrant Khachatrian, Jonathan May
2021 arXiv   pre-print
Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks.  ...  A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.  ...  On the sentiment analysis task, the performance is comparable to the fully fine-tuned language models.  ... 
arXiv:2101.00121v2 fatcat:zkzwn7w3qrau7louqmqy3fmlqi

Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

Anna Rogers, Olga Kovaleva, Matthew Downey, Anna Rumshisky
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and text-specific questions, unlikely to be found in pretraining data.  ...  Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information.  ...  Acknowledgements This project is funded in part by an NSF CAREER award to Anna Rumshisky (IIS-1652742).  ... 
doi:10.1609/aaai.v34i05.6398 fatcat:fu2eqrg54zevzclhumczzwke5a

Declaration-based Prompt Tuning for Visual Question Answering [article]

Yuhang Liu, Wei Wei, Daowan Peng, Feida Zhu
2022 arXiv   pre-print
model is first optimized via self-supervised task objectives, e.g., masked language modeling (MLM) and image-text matching (ITM), and then fine-tuned to adapt to downstream task (e.g., VQA) via a brand-new  ...  of VQA model, boosting the effective adaptation of pre-trained VL models to the downstream task.  ...  Inspired by the recent progress of vision-language pretrained models (VL-PTM) and prompt tuning paradigms in cross-modal domain [Yao et al., 2021; Tsimpoukelli et al., 2021; Radford et al., 2021] , in  ... 
arXiv:2205.02456v1 fatcat:47gbc67h6vcs5lxkgghopquaxu

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? [article]

Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel
2022 arXiv   pre-print
and masked language modeling), and evaluated with and without multitask prompted finetuning.  ...  We find that pretrained non-causal decoder models can be adapted into performant generative causal decoder models, using autoregressive language modeling as a downstream task.  ...  Specifically, this work was conducted by a task force within the architecture & scaling group, seeking to establish the optimal architecture and pretraining objective for the final 176B parameter model  ... 
arXiv:2204.05832v1 fatcat:srmtrteyozbarl2v5tqu5s7mwe

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation [article]

Mozhdeh Gheini, Xiang Ren, Jonathan May
2021 arXiv   pre-print
We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings.  ...  We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the  ... 
arXiv:2104.08771v2 fatcat:mtua5a3nbrdg5g5iabnehgw7ui
« Previous Showing results 1 — 15 out of 242 results