NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework [article]

Xingcheng Yao, Yanan Zheng, Xiaocong Yang, Zhilin Yang
2022 arXiv   pre-print
Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight
more » ... datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.
arXiv:2111.04130v2 fatcat:bsecajqdvnfrhlkhxxevmmc324