A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Assessing the Impact of OCR Errors in Information Retrieval
[chapter]
2020
Lecture Notes in Computer Science
A significant amount of the textual content available on the Web is stored in PDF files. These files are typically converted into plain text before they can be processed by information retrieval or text mining systems. Automatic conversion typically introduces various errors, especially if OCR is needed. In this empirical study, we simulate OCR errors and investigate the impact that misspelled words have on retrieval accuracy. In order to quantify such impact, errors were systematically
doi:10.1007/978-3-030-45442-5_13
fatcat:uignn2cccfc7dlko4yhxppykae