POST-EDITING THROUGH APPROXIMATION AND GLOBAL CORRECTION

KAZEM TAGHVA, JULIE BORSACK, BRYAN BULLARD, ALLEN CONDIT
1995 International journal of pattern recognition and artificial intelligence  
This paper describes a new automatic spelling correction program to deal with OCR generated errors. The method used here is based on three principles: 1. Approximate string matching between the misspellings and the terms occuring in the database as opposed to the entire dictionary 2. Local information obtained from the individual documents 3. The use of a confusion matrix, which contains information inherently specific to the nature of errors caused by the particular OCR device This system is
more » ... en utilized to process approximately 10,000 pages of OCR generated documents. Among the misspellings discovered by this algorithm, about 87% were corrected. *
doi:10.1142/s0218001495000377 fatcat:jyczv7w7ynhd7amxu77uqhdoti