1 Hit in 1.6 sec

ScholarBERT: Bigger is Not Always Better [article]

Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Carl Malamud, Roger Magoulas, Kyle Chard, Ian Foster
2022 arXiv   pre-print
Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g  ...  In this process, we created the largest and most diverse scientific language model to date, ScholarBERT, by training a 770M-parameter BERT model on an 221B token scientific literature dataset spanning  ...  More parameters and larger pretraining corpus did not always lead to big increases in F-1 scores, as we had initially expected.  ... 
arXiv:2205.11342v1 fatcat:dtmfpp4ecnfdjkdvnspv6o5f7u