The OCRopus open source OCR system

Thomas M. Breuel, Berrin A. Yanikoglu, Kathrin Berkner
2008 Document Recognition and Retrieval XV  
OCRopus is a new, open source OCR system emphasizing modularity, easy extensibility, and reuse, aimed at both the research community and large scale commercial document conversions. This paper describes the current status of the system, its general architecture, as well as the major algorithms currently being used for layout analysis and text line recognition. Above, we saw generally how the processing steps of the OCRopus system fit together. Let us now look at each of the processing steps in more detail.
doi:10.1117/12.783598 dblp:conf/drr/Breuel08 fatcat:k4cdglpamvee7ajcmrarop66bq