An automatic linking service of document images reducing the effects of OCR errors with latent semantics

Renato F. Bulcão-Neto, José Camacho-Guerrero, Álvaro Barreiro, Javier Parapar, Alessandra A. Macedo
2010 Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10  
Robust Information Retrieval (IR) systems have been demanded due to the widespread and multipurpose use of document images, and the high number of document images repositories available nowadays. This paper presents a novel approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). The LinkDI service extracts and indexes document images content, obtains its latent semantics, and
more » ... relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents and among their respective document images. Results show the feasibility of LinkDI relating OCR output with high degradation.
doi:10.1145/1774088.1774092 dblp:conf/sac/NetoGBPM10 fatcat:xqujwg56qvhgrm7zev6467qfna