A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing
2021
Journal of Imaging
An existing end-to-end OCR software called anyOCR achieves high recognition accuracy for historical documents. ...
Optical character recognition (OCR) is typically applied to scanned historical archives to transcribe them from document images into machine-readable texts. ...
Unlike many commercial and open-source OCR engines, the anyOCR system transcribes modern and historical documents with high accuracy. ...
doi:10.3390/jimaging7090175
pmid:34564101
pmcid:PMC8467298
fatcat:jsgsgmvfmrg3flzpkpeuoyvq2q
iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing
2021
International journal of parallel programming
(SoC)}$$ System-on-Chip (SoC) based on anyOCR for digitizing historical documents. ...
To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an $$\hbox {OCR}$$ OCR is applied. ...
Experimental Setup and Results For experimental purposes, we used the historical Latin document images dataset [6] in order to compare the reference anyOCR system with our hardware and optimized software ...
doi:10.1007/s10766-020-00690-y
fatcat:mhfyemm23zg7vix7pyglej4wgu
A Robust Page Frame Detection Method for Complex Historical Document Images
2019
Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods
One of the known open-source OCR system called anyOCR (Bukhari et al., 2017) is specially developed for Latin script historical documents, but also remarkable for contemporary and semi-structured documents ...
Existing commercial OCR systems (like ABBYY 1 1 https://www.abbyy.com/ and OmniPage 2 ) and open-source OCR systems (like OCRopus 3 and Tesseract 4 ) have traditionally been optimized for contemporary ...
doi:10.5220/0007382405560564
dblp:conf/icpram/RezaRBD19
fatcat:beqm4is7ajckndbtken2n7wxdy
Graph-Based Keyword Spotting in Historical Documents Using Context-Aware Hausdorff Edit Distance
2018
2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
ACKNOWLEDGMENT The authors would like to thank the Siemens Postal, Parcel & Airport Logistics GmbH for funding this work. ...
CONCLUSION The anyOCR is an open-source system which gives very good accuracy for standard document images such as pages from books, magazines and so on. ...
For both domains and for all five languages, accuracy of the Google system is higher than that of Tesseract, a well-known and widely used open-source OCR system.
III. PERFORMANCE
IV. ...
doi:10.1109/das.2018.31
dblp:conf/das/Stauffer0R18
fatcat:2r2cjpiitfcs5knjtqbfvcuwsi
Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends
[article]
2020
arXiv
pre-print
Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text ...
Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. ...
Moreover, as an open source project, it is also an excellent choice for researchers and practitioners who hold to an open source ethos in their tools and research. ...
arXiv:2002.06300v2
fatcat:nxufntuk7famfph6ownyuys2py
'An eye for an aye' : linguistic and political backlash and conformity in eighteenth-century Scots
[article]
2018
the effects of social and political changes that were occurring during the eighteenth century in Scotland on the use of written Scots, focussing in particular upon authors who were known to have been for ...
I take a quantitative sociolinguistic approach to historical data by utilising statistical techniques that examine linguistic variation in a data-driven manner. ...
Yet, both commercial OCR systems (like Abbyy and OmniPage) and open-source programmes (like OCRopus and Tesseract) have traditionally been optimized for clean, contemporary texts rather than historical ...
doi:10.26021/5082
fatcat:hsbpcg2e6zdszdylwqcxufo2ju