Comparing Approaches to Mathematical Document Analysis from PDF

Josef B. Baker, Alan P. Sexton, Volker Sorge, Masakazu Suzuki
2011 2011 International Conference on Document Analysis and Recognition  
Document analysis of mathematical texts is a challenging problem even for born-digital documents in standard formats. We present alternative approaches addressing this problem in the context of PDF documents. One uses an OCR approach for character recognition together with a virtual link network for structural analysis. The other uses direct extraction of symbol information from the PDF file with a two stage parser to extract layout and expression structures. With reference to ground truth
more » ... we compare the effectiveness and accuracy of the two techniques quantitatively with respect to character identification and structural analysis of mathematical expressions and qualitatively with respect to layout analysis.
doi:10.1109/icdar.2011.99 dblp:conf/icdar/BakerSSS11 fatcat:snqynbc4yje3ldz3hbxic3rhoi