A document image analysis system on parallel processors

S. Sural, P.K. Das
Proceedings Fourth International Conference on High-Performance Computing  
This paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step Minimum Containing Rectangle (MCR) detection stage. Text Block Filtering (TBF) is then done heuristically and the filtered blocks are submitted to a Multi-Layer Perceptron (MLP) for recognition of characters. Smoothing of the document image is done during
more » ... ge is done during MLP-based character recognition to reduce the pre-processing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the total execution time.
doi:10.1109/hipc.1997.634542 dblp:conf/hipc/SuralD97 fatcat:h3kxphfhhrc4he3gwhzpr6vvwq