A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model
2022
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change
unpublished
In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years
doi:10.18653/v1/2022.lchange-1.7
fatcat:vmebyeb3zzajlcvj3fapf2jwpi