Robust text and drawing segmentation algorithm for historical documents

Rafi Cohen, Abedelkadir Asi, Klara Kedem, Jihad El-Sana, Itshak Dinstein
2013 Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP '13  
We present a method to segment historical document images into regions of different content. First, we segment text elements from non-text elements using a binarized version of the document. Then, we refine the segmentation of the non-text regions into drawings, background and noise. At this stage, spatial and color features are exploited to guarantee coherent regions in the final segmentation. Experiments show that the suggested approach achieves better segmentation quality with respect to
more » ... r methods. We examine the segmentation quality on 252 pages of a historical manuscript, for which the suggested method achieves about 92% and 90% segmentation accuracy of drawings and text elements, respectively.
doi:10.1145/2501115.2501117 dblp:conf/icdar/CohenAKED13 fatcat:7q3mhoqtfrfwld4vqjqdqpsq34