Handwritten and Machine Printed Text Separation in Document Images Using the Bag of Visual Words Paradigm

Konstantinos Zagoris, Ioannis Pratikakis, Apostolos Antonacopoulos, Basilis Gatos, Nikos Papamarkos
2012 2012 International Conference on Frontiers in Handwriting Recognition  
In a number of types of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may be present in the same document image, giving rise to significant issues within a digitisation and recognition pipeline. It is therefore necessary to separate the two types of text before applying different recognition methodologies to each. In this paper, a new approach is proposed which strives towards identifying and separating handwritten from
more » ... dwritten from machine printed text using the Bag of Visual Words paradigm (BoVW). Initially, blocks of interest are detected in the document image. For each block, a descriptor is calculated based on the BoVW. The final characterization of the blocks as Handwritten, Machine Printed or Noise is made by a Support Vector Machine classifier. The promising performance of the proposed approach is shown by using a consistent evaluation methodology which couples meaningful measures along with a new dataset. 2012 International Conference on Frontiers in Handwriting Recognition 978-0-7695-4774-9/12 $26.00
doi:10.1109/icfhr.2012.207 dblp:conf/icfhr/ZagorisPAGP12 fatcat:vlsl3u4p6vgqnoq5sx2qmfaz2y