A structural signature based on texture for digitized historical book page categorization

Maroua Mehri, Pierre Heroux, Julien Lerouge, Petra Gomez-Kramer, Remy Mullot
2015 2015 13th International Conference on Document Analysis and Recognition (ICDAR)  
The work conducted in this article presents a structural signature based on texture for the characterization and categorization of digitized historical book pages. The proposed signature does not assume a priori knowledge regarding page layout and content, and hence, it is applicable to a large variety of ancient books. By integrating varying low-level features (e.g. texture) characterizing the different page components (i.e. different text fonts or graphic regions) on the one hand, and
more » ... al information describing the page layout on the other hand, the proposed signature provides a rich and holistic description of the layout and content of the analyzed book pages. More precisely, the signature-based characterization approach consists of two stages. The first stage is extracting automatically homogeneous regions. Then, the second one is proposing a graph-based page signature, which is based on the extracted homogeneous regions, reflecting its layout and content. This signature ensures the implementation of numerous applications for managing effectively a corpus or collections of books (e.g. information retrieval in digital libraries according to several criteria or page categorization). To illustrate the effectiveness of the proposed page signature, a detailed experimental evaluation has been conducted in this article for assessing two possible categorization applications, unsupervised page classification and page stream segmentation.
doi:10.1109/icdar.2015.7333737 dblp:conf/icdar/MehriHLGM15 fatcat:t6hbzcgcufaixeej7degojwtb4