A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
The OCRopus open source OCR system
2008
Document Recognition and Retrieval XV
OCRopus is a new, open source OCR system emphasizing modularity, easy extensibility, and reuse, aimed at both the research community and large scale commercial document conversions. This paper describes the current status of the system, its general architecture, as well as the major algorithms currently being used for layout analysis and text line recognition. Above, we saw generally how the processing steps of the OCRopus system fit together. Let us now look at each of the processing steps in more detail.
doi:10.1117/12.783598
dblp:conf/drr/Breuel08
fatcat:k4cdglpamvee7ajcmrarop66bq