OVER-SPLITTED AND MERGED FOR GEOMETRY DOCUMENT LAYOUT ANALYSIS

Ha Dai Ton, Nguyen Duc Dung, Le Duc Hieu
2016 FAIR - NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN 2015   unpublished
Automatic transformation of paper documents into electronic forms requires geometry document layout analysis at the first stage. However, variations in character font sizes, text-line spacing, and layout structures have made it difficult to design a general purpose method. The use of some parameters has therefore been unavoidable in geometry document layout analysis algorithms. This lead to errors over-segmentation and under-segmentation of previous algorithms. This paper present a new approach
more » ... to geometry document layout analysis. Our algorithm use a set of whitespace covering document background to reduce candidate zones. Some of them may be considered as over-segmented. The way bottom-up is used to group over-segmentation zones each other based on adaptive parameters. Finally, we proposed context analysis at textline level to segment document images into paragraphs. Experimental results on the ICDAR2009 competition data set shown that the proposed algorithm reduces vast amount of both over-segmentation and under-segmentation errors, thus boost the performance significantly comparing to the state-of-theart algorithms
doi:10.15625/vap.2015.000191 fatcat:zabg2nvaafczbkkoxxpvgr73iy