Integrating diverse knowledge sources in text recognition

Sargur N. Srihari, Jonathan J. Hull, Ramesh Choudhari
1983 ACM Transactions on Information Systems  
A new algorithm for text recognition that corrects character substitution errors in words of text is presented. The search for a correct word effectively integrates three knowledge sources: channel characteristics, bottom-up context, and top-down context. Channel characteristics are used in the form of probabilities that observed letters are corruptions of other letters; bottom-up context is in the form of the probability of a letter when the previous letters of the word are known; and top-down
more » ... known; and top-down context is in the form of a lexicon. A one-pass algorithm is obtained by merging a previously known dynamic programming algorithm to compute the maximum a posteriori probability string (known as the Viterbi algorithm) with searching a lexical trie. Analysis of the computational compexity of the algorithm and results of experimentation with a PASCAL implementation are presented.
doi:10.1145/357423.357428 fatcat:ssrlio6rajfk7imyz4usjkljsm