Adaptive Hindi OCR using generalized Hausdorff image comparison

Huanfeng Ma, David Doermann
2003 ACM Transactions on Asian Language Information Processing  
In this paper, we present an adaptive Hindi OCR using generalized Hausdorff image comparison implemented as part of a rapidly retargetable language tool effort. The system includes: script identification, character segmentation, training sample creation and character recognition. The OCR design (completed in one month) was applied to a complete Hindi-English bilingual dictionary (with 1083 pages) and a collection of ideal images extracted from Hindi documents in PDF format. Experimental results
more » ... show the recognition accuracy can reach 88% for noisy images and 95% for ideal images, both at the character level. The presented method can also be extended to design OCR systems for different scripts.
doi:10.1145/979872.979875 fatcat:c7ktgcciy5dqbl4qw6hky5tbda