A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Muppet: Massive Multi-task Representations with Pre-Finetuning
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
We propose pre-finetuning, an additional largescale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that prefinetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of
doi:10.18653/v1/2021.emnlp-main.468
fatcat:e73mympdubfr5dpkr4r6ynjzra