A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Representing OCRed documents in HTML
Proceedings of the Fourth International Conference on Document Analysis and Recognition
OCR is an error-prone process. It is time-consuming and expensive t o m a n ually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid document which combines symbolic representation and image representation may relieve the problem. If we represent a OCRed document properly in HTML, OCR errors will not have m uch negative eect on the
doi:10.1109/icdar.1997.620628
dblp:conf/icdar/HongS97
fatcat:r6yrzgboubau7cxrnhcwdy7zje