Adding linguistic constraints to document image decoding: comparing the iterated complete path and stack algorithms

Kris Popat, Daniel H. Greene, Justin K. Romberg, Dan S. Bloomberg, Jiangying Zhou, Paul B. Kantor, Daniel P. Lopresti
2000 Document Recognition and Retrieval VIII  
Beginning with an observed document image and a model of how the image has been degraded, Document Image Decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential predictive probabilistic language model, can improve recognition accuracy significantly in the case of moderately to severely corrupted documents. Two methods of incorporating linguistic constraints
more » ... uistic constraints in the best-path search are described, analyzed and compared. The first, called the iterated complete path algorithm, involves iteratively rescoring complete paths using conditional language model probability distributions of increasing order, expanding state only as necessary with each iteration. A property of this approach is that it results in a solution that is exactly optimal with respect to the specified source, degradation, and language models; no approximation is necessary. The second approach considered is the Stack algorithm, which is often used in speech recognition and in the decoding of convolutional codes. Experimental results are presented in which text line images that have been corrupted in a known way are recognized using both the ICP and Stack algorithms. This controlled experimental setting preserves many of the essential features and challenges of real text line decoding, while highlighting the important algorithmic issues. Recently, a technique for incorporating linguistic constraints into DID was proposed and partially explored using a simulated, one-dimensional Morse-code signaling scheme having known corruption parameters. ¤ While useful in illustrating the functioning of the proposed algorithm, that treatment did not place the technique in sufficient perspective to draw conclusions about it. In this paper, we examine that technique more closely, and compare it with the Stack algorithm, which is a standard, widely used alternative. In addition, we replace the one-dimensional Morse-code setting with one involving synthetic two-dimensional text-line images. For methodological reasons, we continue to exercise tight control over the manner in which the images are produced and corrupted. Nevertheless, working on two-dimensional images of printed text improves both the realism and the relevance of the resulting comparison and analysis over the previous experimental framework.
doi:10.1117/12.410844 dblp:conf/drr/PopatGRB01 fatcat:hq77qflv2vdytnckz3wsf4mjw4