Filters








1,081 Hits in 6.1 sec

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [article]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
2020 arXiv   pre-print
In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task.  ...  Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.  ...  Acknowledgments The authors thank Dallas Card, Mark Neumann, Nelson Liu, Eric Wallace, members of the Al-lenNLP team, and anonymous reviewers for helpful feedback, and Arman Cohan for providing data.  ... 
arXiv:2004.10964v3 fatcat:cwmjixjpcve25hezsj6kgjxdji

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
Unsupervised domain clusters in pretrained language models. In ACL.  ...  In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task.  ...  Acknowledgments The authors thank Dallas Card, Mark Neumann, Nelson Liu, Eric Wallace, members of the Al-lenNLP team, and anonymous reviewers for helpful feedback, and Arman Cohan for providing data.  ... 
doi:10.18653/v1/2020.acl-main.740 fatcat:2yk5xnaigjdo7bpy6n3l32ef44

Investigating Pretrained Language Models for Graph-to-Text Generation [article]

Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich Schütze, Iryna Gurevych
2021 arXiv   pre-print
In this paper, we investigate two recently proposed pretrained language models (PLMs) and analyze the impact of different task-adaptive pretraining strategies for PLMs in graph-to-text generation.  ...  We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.  ...  Ribeiro is supported by the German Research Foundation (DFG) as part of the Research Training Group "Adaptive Preparation of Information form Heterogeneous Sources" (AIPHES, GRK 1994/1) and as part of  ... 
arXiv:2007.08426v3 fatcat:zwirbzu5gnatth5qzxeyodmlyu

Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining [article]

Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, Qi Zhang
2021 arXiv   pre-print
To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data.  ...  The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization.  ...  Acknowledgements The authors wish to thank the anonymous reviewers for their helpful comments. This work was partially funded by China National Key R&D Program  ... 
arXiv:2109.04080v2 fatcat:yhmcrg7fpncdxlbmz3zbecakfi

Semi-Supervised Text Classification via Self-Pretraining [article]

Payam Karisani, Negin Karisani
2021 arXiv   pre-print
We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm.  ...  To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations  ...  Instances of such tasks are language model pretraining [18, 41] in NLP, and contrastive learning in image processing [15, 46] .  ... 
arXiv:2109.15300v1 fatcat:7moq5cc6m5g2lluqoon4y5li4u

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting [article]

Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, Xiangzhan Yu
2020 arXiv   pre-print
Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning.  ...  To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.  ...  and a wide range of NLP tasks has been promoted by these pretrained language models.  ... 
arXiv:2004.12651v1 fatcat:zy3poh6lmbb6fksdlnncyds4fa

Pretrained Transformers for Text Ranking: BERT and Beyond [article]

Jimmy Lin, Rodrigo Nogueira, Andrew Yates
2021 arXiv   pre-print
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond.  ...  Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications.  ...  Special thanks goes out to two anonymous reviewers for their insightful comments and helpful feedback.  ... 
arXiv:2010.06467v3 fatcat:obla6reejzemvlqhvgvj77fgoy

Pretrained Transformers for Text Ranking: BERT and Beyond

Andrew Yates, Rodrigo Nogueira, Jimmy Lin
2021 Proceedings of the 14th ACM International Conference on Web Search and Data Mining  
In the context of text ranking, these models produce high quality results across many domains, tasks, and settings.  ...  The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond.  ...  Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques  ... 
doi:10.1145/3437963.3441667 fatcat:6teqmlndtrgfvk5mneq5l7ecvq

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design [article]

Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon Shashua
2022 arXiv   pre-print
Second, our result clearly indicates further improvements to be made in NLM pretraining for the benefit of Natural Language Understanding tasks.  ...  As an example, we propose "kNN-Pretraining": we show that including semantically related non-neighboring sentences in the same pretraining example yields improved sentence representations and open domain  ...  Don't stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020. Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang.  ... 
arXiv:2110.04541v3 fatcat:rcopmyyblzbdxe2rz7df6jmwfy

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining [article]

Ananya B. Sai, Akash Kumar Mohankumar, Siddhartha Arora, Mitesh M. Khapra
2020 arXiv   pre-print
These models aim to assign a high score to all relevant responses and a low score to all irrelevant responses.  ...  To check if large scale pretraining could help, we propose a new BERT-based evaluation metric called DEB, which is pretrained on 727M Reddit conversations and then finetuned on our dataset.  ...  resources required to carry out this research.  ... 
arXiv:2009.11321v1 fatcat:jjlr3phvovhmrobkfn2ynnxnny

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog [article]

Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš
2021 arXiv   pre-print
In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for task-oriented dialog.  ...  Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented  ...  Dialog Pretrained Language Models.  ... 
arXiv:2110.08395v1 fatcat:ipv37ot5drexvoh4rwfbphn7wm

Time Waits for No One! Analysis and Challenges of Temporal Misalignment [article]

Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith
2021 arXiv   pre-print
Our study is focused on the ubiquitous setting where a pretrained model is optionally adapted through continued domain-specific pretraining, followed by task-specific finetuning.  ...  In this work, we establish a suite of eight diverse tasks across different domains (social media, science papers, news, and reviews) and periods of time (spanning five years or more) to quantify the effects  ...  Deep contextualized word repre- and Noah A Smith. 2020. Don’t stop pretraining: sentations. In NAACL. Adapt language models to domains and tasks.  ... 
arXiv:2111.07408v1 fatcat:zr6xdyoicvgzfgglve3j5gmvrm

Training Neural Response Selection for Task-Oriented Dialogue Systems

Matthew Henderson, Ivan Vulić, Daniela Gerz, Iñigo Casanueva, Paweł Budzianowski, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, Nikola Mrkšić, Pei-Hao Su
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
To train response selection models for taskoriented dialogue tasks, we propose a novel method which: 1) pretrains the response selection model on large general-domain conversational corpora; and then 2  ...  Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue.  ...  Which method can efficiently adapt the pretrained model to a spectrum of target dialogue domains?  ... 
doi:10.18653/v1/p19-1536 dblp:conf/acl/HendersonVGCBCS19 fatcat:c63xc4uv2zc3lmiy32czv4smae

Evaluating Deep Learning Approaches for Covid19 Fake News Detection [article]

Apurva Wani, Isha Joshi, Snehal Khandve, Vedangi Wagh, Raviraj Joshi
2021 arXiv   pre-print
We also evaluate the importance of unsupervised learning in the form of language model pre-training and distributed word representations using unlabelled covid tweets corpus.  ...  Therefore it is important to curb fake news at source and prevent it from spreading to a larger audience. We look at automated techniques for fake news detection from a data mining perspective.  ...  We would also like to thank the competition organizers for providing us an opportunity to explore the domain.  ... 
arXiv:2101.04012v2 fatcat:2ny3dzconfhxhgqzbm3mtm53y4

Neural Supervised Domain Adaptation by Augmenting Pre-trained Models with Random Units [article]

Sara Meftah, Nasredine Semmar, Youssef Tamaazousti, Hassane Essafi, Fatiha Sadat
2021 arXiv   pre-print
We show that our approach exhibits significant improvements to the standard fine-tuning scheme for neural domain adaptation from the news domain to the social media domain on four NLP tasks: part-of-speech  ...  Neural Transfer Learning (TL) is becoming ubiquitous in Natural Language Processing (NLP), thanks to its high performance on many tasks, especially in low-resourced scenarios.  ...  Domain adaptation consists in adapting NLP models designed for specific high-resourced source setting(s) (language, language variety, domain, task, etc) to work in a target low-resourced setting(s Pretraining  ... 
arXiv:2106.04935v1 fatcat:blerqmjokfc73insvdudxnpzhy
« Previous Showing results 1 — 15 out of 1,081 results