Restoring Chinese documents images based on text boundary lines

Hong Liu, Runwei Ding
2009 2009 IEEE International Conference on Systems, Man and Cybernetics  
Distortion always appears in document images while scanning thick bound volumes. There are two kinds of distortion for the scanned grayscale images, shadow appears at the volumes' spine area, and warping of the words occurs in the shadow. In this paper, a novel text boundary lines based method for efficient restoration of warped scanning Chinese document images is presented. We first detect on which side of an image the shadow lays by row grayscale analysis method. Then the shadow is removed by
more » ... hadow is removed by a modified Niblack's algorithm. In order to detect the warped feature, a text boundary lines' detection method is proposed. Finally, an adjustment method based on the text boundary lines is carried to restore the warped words. Experiments on 400 various scanning Chinese document images are implemented. The improvement on average character recall is 11.92% to 14.89%. Experiments show that the proposed restoration method is efficient for Chinese documents with both text and non-text regions.
doi:10.1109/icsmc.2009.5346660 dblp:conf/smc/LiuD09 fatcat:7luthkowcva2tmgtpik2idt2i4