Two complementary techniques for digitized document analysis

George Nagy, Junichi Kanai, Mukkai Krishnamoorthy, Mathews Thomas, Mahesh Viswanathan
1988 Proceedings of the ACM conference on Document processing systems - DOCPROCS '88  
Two complementary methods are proposed for characterizing the spatial structure of digitized technical documents and labelling various logical components without using optical character recognition. The top-down method segments and labels the page image simultaneously using publication-specific information in the form of a page-grammar. The bottom-up method naively segments the document into rectangles that contain individual connected components, combines blocks using knowledge about generic
more » ... yout objects, and identifies logical objects using publication-specific knowledge. Both methods are based on the X-Y tree representation of a page image. The procedures are demonstrated on scanned and synthesized bit-maps of the title pages of technical articles.
doi:10.1145/62506.62539 fatcat:mj7vn652orezzczcuxacldd4zu