Improved string matching under noisy channel conditions

Kevyn Collins-Thompson, Charles Schweizer, Susan Dumais
2001 Proceedings of the tenth international conference on Information and knowledge management - CIKM'01  
Many document-based applications, including popular Web browsers, email viewers, and word processors, have a 'Find on this Page' feature that allows a user to find every occurrence of a given string in the document. If the document text being searched is derived from a noisy process such as optical character recognition (OCR), the effectiveness of typical string matching can be greatly reduced. This paper describes an enhanced string-matching algorithm for degraded text that improves recall,
more » ... le keeping precision at acceptable levels. The algorithm is more general than most approximate matching algorithms and allows string-to-string edits with arbitrary costs. We develop a method for evaluating our technique and use it to examine the relative effectiveness of each sub-component of the algorithm. Of the components we varied, we find that using confidence information from the recognition process lead to the largest improvements in matching accuracy.
doi:10.1145/502585.502646 dblp:conf/cikm/Collins-ThompsonSD01 fatcat:fk62h25shrcanbofkfxa4pbgi4