Text line segmentation of historical documents: a survey

Laurence Likforman-Sulem, Abderrazak Zahour, Bruno Taconet
2006 International Journal on Document Analysis and Recognition  
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents
more » ... noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.
doi:10.1007/s10032-006-0023-z fatcat:suqv7dkiwvaypd5hyn4jdiztgi