6 Hits in 1.5 sec

iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing

Menbere Kina Tekleyohannes, Vladimir Rybalkin, Muhammad Mohsin Ghaffar, Javier Alejandro Varela, Norbert Wehn, Andreas Dengel
2021 Journal of Imaging  
An existing end-to-end OCR software called anyOCR achieves high recognition accuracy for historical documents.  ...  Optical character recognition (OCR) is typically applied to scanned historical archives to transcribe them from document images into machine-readable texts.  ...  Unlike many commercial and open-source OCR engines, the anyOCR system transcribes modern and historical documents with high accuracy.  ... 
doi:10.3390/jimaging7090175 pmid:34564101 pmcid:PMC8467298 fatcat:jsgsgmvfmrg3flzpkpeuoyvq2q

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

Menbere Kina Tekleyohannes, Vladimir Rybalkin, Muhammad Mohsin Ghaffar, Javier Alejandro Varela, Norbert Wehn, Andreas Dengel
2021 International journal of parallel programming  
(SoC)}$$ System-on-Chip (SoC) based on anyOCR for digitizing historical documents.  ...  To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an $$\hbox {OCR}$$ OCR is applied.  ...  Experimental Setup and Results For experimental purposes, we used the historical Latin document images dataset [6] in order to compare the reference anyOCR system with our hardware and optimized software  ... 
doi:10.1007/s10766-020-00690-y fatcat:mhfyemm23zg7vix7pyglej4wgu

A Robust Page Frame Detection Method for Complex Historical Document Images

Mohammad Reza, Md. Rakib, Syed Bukhari, Andreas Dengel
2019 Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods  
One of the known open-source OCR system called anyOCR (Bukhari et al., 2017) is specially developed for Latin script historical documents, but also remarkable for contemporary and semi-structured documents  ...  Existing commercial OCR systems (like ABBYY 1 1 and OmniPage 2 ) and open-source OCR systems (like OCRopus 3 and Tesseract 4 ) have traditionally been optimized for contemporary  ... 
doi:10.5220/0007382405560564 dblp:conf/icpram/RezaRBD19 fatcat:beqm4is7ajckndbtken2n7wxdy

Graph-Based Keyword Spotting in Historical Documents Using Context-Aware Hausdorff Edit Distance

Michael Stauffer, Andreas Fischer, Kaspar Riesen
2018 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)  
ACKNOWLEDGMENT The authors would like to thank the Siemens Postal, Parcel & Airport Logistics GmbH for funding this work.  ...  CONCLUSION The anyOCR is an open-source system which gives very good accuracy for standard document images such as pages from books, magazines and so on.  ...  For both domains and for all five languages, accuracy of the Google system is higher than that of Tesseract, a well-known and widely used open-source OCR system. III. PERFORMANCE IV.  ... 
doi:10.1109/das.2018.31 dblp:conf/das/Stauffer0R18 fatcat:2r2cjpiitfcs5knjtqbfvcuwsi

Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends [article]

James P. Philips, Nasseh Tabrizi
2020 arXiv   pre-print
Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text  ...  Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars.  ...  Moreover, as an open source project, it is also an excellent choice for researchers and practitioners who hold to an open source ethos in their tools and research.  ... 
arXiv:2002.06300v2 fatcat:nxufntuk7famfph6ownyuys2py

'An eye for an aye' : linguistic and political backlash and conformity in eighteenth-century Scots [article]

Sarah Van Eyndhoven, University Of Canterbury
the effects of social and political changes that were occurring during the eighteenth century in Scotland on the use of written Scots, focussing in particular upon authors who were known to have been for  ...  I take a quantitative sociolinguistic approach to historical data by utilising statistical techniques that examine linguistic variation in a data-driven manner.  ...  Yet, both commercial OCR systems (like Abbyy and OmniPage) and open-source programmes (like OCRopus and Tesseract) have traditionally been optimized for clean, contemporary texts rather than historical  ... 
doi:10.26021/5082 fatcat:hsbpcg2e6zdszdylwqcxufo2ju