A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g ... In this process, we created the largest and most diverse scientific language model to date, ScholarBERT, by training a 770M-parameter BERT model on an 221B token scientific literature dataset spanning ... More parameters and larger pretraining corpus did not always lead to big increases in F-1 scores, as we had initially expected. ...arXiv:2205.11342v1 fatcat:dtmfpp4ecnfdjkdvnspv6o5f7u