A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Named entity recognition from scanned and OCRed historical documents can contribute to historical research. However, entity recognition from historical documents is more difficult than from natively digital data because of the presence of word errors and the absence of complete formatting information. We apply four extraction algorithms to various types of noisy OCR data found "in the wild" and focus on full name extraction. We evaluate the extraction quality with respect to handlabeled testdoi:10.1145/1871840.1871845 dblp:conf/and/PackerLSERSJ10 fatcat:6hph2wedg5ef7mzdwxi5g7qt4y