Filters








10 Hits in 6.2 sec

Which OCR toolset is good and why? A comparative study

Pooja Jain, Dept. Of Computer Science & Applications, Panjab University, Chandigarh, India, Dr. Kavita Taneja, Dr. Harmunish Taneja, Dept. Of Computer Science & Applications, Panjab University, Chandigarh, India, Dept. Of Computer Science & Information Tech., DAV College, Sec - 10, Chandigarh, India
<span title="2021-04-05">2021</span> <i title="Kuwait Journal of Science"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/7r7osjshkjholmpcusnobdus5e" style="color: black;">Maǧallaẗ Al-Kuwayt li-l-ʿulūm</a> </i> &nbsp;
Many OCR toolsets are available under various categories, including open-source, proprietary, and online services.  ...  This research paper provides a comparative study of various OCR toolsets considering a variety of parameters.  ...  , provided high-quality OCR result with over 97% character-accuracy and around 92% word-accuracy on an early printed book (15th century) within a reasonable amount of time.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.48129/kjs.v48i2.9589">doi:10.48129/kjs.v48i2.9589</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/tgxguydolzdidlu4xk2vfrprza">fatcat:tgxguydolzdidlu4xk2vfrprza</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210408094433/https://journalskuwait.org/kjs/index.php/KJS/article/download/9589/443" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7c/4d/7c4d264119ff0096d41ec90f7db6a9c12cc05afb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.48129/kjs.v48i2.9589"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
<span title="2019-11-13">2019</span> <i title="MDPI AG"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/smrngspzhzce7dy6ofycrfxbim" style="color: black;">Applied Sciences</a> </i> &nbsp;
In this paper, we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow.  ...  Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography.  ...  Acknowledgments: The authors would like to express their gratitude to the entire Narragonien digital work  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/app9224853">doi:10.3390/app9224853</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3dd7pnyblrdq3e4lsjlodkd52y">fatcat:3dd7pnyblrdq3e4lsjlodkd52y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191116052245/https://res.mdpi.com/d_attachment/applsci/applsci-09-04853/article_deploy/applsci-09-04853.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e1/8c/e18cb5f4a7d66460ba4b4bc75aa232b769152279.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/app9224853"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a>

OCR4all – An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings [article]

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
<span title="2019-09-09">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow.  ...  Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography.  ...  Furthermore, we would like to thank the Opera Camerarii team around Thomas Baier, Marion Gindhart, Joachim Hamm, and Ulrich Schlegelmilch for providing a valuable and challenging use case and test object  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.04032v1">arXiv:1909.04032v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/czzg6o6i5baxdcnsc2cacm5xmy">fatcat:czzg6o6i5baxdcnsc2cacm5xmy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200826023815/https://arxiv.org/pdf/1909.04032v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/24/7b/247b65032b9918dfdd6ac1591a7cf5b7ddbafda3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.04032v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning [article]

Christian Reul, Christoph Wick, Maximilian Nöth, Andreas Büttner, Maximilian Wehner, Uwe Springmann
<span title="2021-06-15">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Evaluations on 29 previously unseen books resulted in a CER of 1.73%, outperforming a widely used standard model with a CER of 2.84% by almost 40%.  ...  Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1.47%, an improvement of up to 50% compared to training from scratch and up to 30%  ...  [17, 18] studied the effectiveness of mixed and book-specific models on (very) early printed books relying on the OCRopus OCR engine: First, they performed experiments on a corpus consisting of twelve  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.07881v1">arXiv:2106.07881v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vw7nxcs2vnaynd7n6jo47hsowq">fatcat:vw7nxcs2vnaynd7n6jo47hsowq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210617020239/https://arxiv.org/ftp/arxiv/papers/2106/2106.07881.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5c/7e/5c7e24990ff9d3e1a4536b81d4f1ba64e23e5378.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.07881v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs [article]

Matteo Romanello, Sven Najem-Meyer, Bruce Robertson
<span title="2021-10-13">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition.  ...  Yet, the exploitation of thousands of digitized historical commentaries was hitherto hindered by the poor quality of Optical Character Recognition (OCR), especially on commentaries to Greek texts.  ...  Second, OCR4all [14] is an open source OCR tool explicitly developed for users with no prior technical background, and especially those working on the earliest printed books.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.06817v1">arXiv:2110.06817v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jdcursc7vjcitpgnh6p74id32e">fatcat:jdcursc7vjcitpgnh6p74id32e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211015071542/https://arxiv.org/pdf/2110.06817v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ff/64/ff642ffacc19b2384a55cc810c076cd0aa7bb4a7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.06817v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optical character recognition with neural networks and post-correction with finite state methods

Senka Drobac, Krister Lindén
<span title="2020-08-20">2020</span> <i title="Springer Science and Business Media LLC"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qlbwpi6y5ratlbyijgcz2zwany" style="color: black;">International Journal on Document Analysis and Recognition</a> </i> &nbsp;
The results show a significant boost in accuracy, resulting in 1.7% CER on the Finnish and 2.7% CER on the Swedish test set.  ...  There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/ tesseract),  ...  as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10032-020-00359-9">doi:10.1007/s10032-020-00359-9</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cjonawcrebec7n34iapdg7frqu">fatcat:cjonawcrebec7n34iapdg7frqu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201108143939/https://link.springer.com/content/pdf/10.1007/s10032-020-00359-9.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b8/0b/b80b21da273a3ac42d1fe7479d2fd1d8f032aea8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10032-020-00359-9"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>

Survey of Post-OCR Processing Approaches

Thi-Tuyet-Hai Nguyen, Adam Jatowt, MIickael Coustaty, Antoine Doucet
<span title="2021-03-01">2021</span> <i title="Zenodo"> Zenodo </i> &nbsp;
Optical character recognition (OCR) is one of the most popular techniques used for converting printed documents into machine-readable ones.  ...  We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches.  ...  Motivated by the great success of SMT on post-processing OCR output and the development of NMT, many recent approaches[6, 58, 106, 109, 140] apply them to fix OCR errors.Multiple open sources of NMT are  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.4635569">doi:10.5281/zenodo.4635569</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/x5qoluap7rgyxakv5lm5qcysya">fatcat:x5qoluap7rgyxakv5lm5qcysya</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210404000949/https://zenodo.org/record/4635569/files/ACM_template___Survey_of_post_OCR_approaches-1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d5/b7/d5b7f861db8cfc19894b286bc8a4bb45abcd4a81.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.4635569"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> zenodo.org </button> </a>

OCR Processing of Swedish Historical Newspapers Using Deep Hybrid CNN–LSTM Networks

Molly Brandt Skelbye, Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg 412 96 Gothenburg, Sweden, Dana Dannélls, Språkbanken Text, Department of Swedish University of Gothenburg 405 30 Gothenburg, Sweden
<span title="">2021</span> <i title="INCOMA Ltd. Shoumen, BULGARIA"> Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications </i> &nbsp; <span class="release-stage">unpublished</span>
By experimenting with the open source OCR engine Calamari, we are able to show that mixed deep CNN-LSTM hybrid models outperform previous models on the task of character recognition of Swedish historical  ...  In this paper we examine to what extent these networks improve the OCR accuracy rates on Swedish historical newspapers.  ...  Acknowledgments This work has been funded by the Swedish Research Council as part of the project Evaluation and refinement of an enhanced OCR-process for mass digitisation (2019-2020; dnr IN18-0940:1).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.26615/978-954-452-072-4_023">doi:10.26615/978-954-452-072-4_023</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/4r5u5zxg7vch3anxj4yakvbt3u">fatcat:4r5u5zxg7vch3anxj4yakvbt3u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211111224853/https://acl-bg.org/proceedings/2021/RANLP%202021/pdf/2021.ranlp-1.23.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/70/39/70392ecb0a9639ef663ccacee10125c2e6e6e62f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.26615/978-954-452-072-4_023"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Book of Abstracts of the Digital Humanities in the Nordic Countries 5th conference. Riga, 20–23 October 2020 [article]

Sanita Reinsone, Anda Baklāne, Jānis Daugavietis
<span title="2020-10-19">2020</span> <i title="Zenodo"> Zenodo </i> &nbsp;
Book of Abstracts DHN, Rīga 2020 Book of Abstracts of the Digital Humanities in the Nordic Countries 5th conference.  ...  Literature, Folklore and Art (University of Latvia) lulfmi.lv Rīga, 2020 ISBN 978-9984-850-83-2 DOI 10.5281/zenodo.4107117  ...  We also thank the library of the Technische Acknowledgements This work has been supported by the European Union's Horizon 2020 research and innovation programme under grant 770299 (NewsEye).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.4107117">doi:10.5281/zenodo.4107117</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6ongky6p5rab7gvtawnjmp2ofm">fatcat:6ongky6p5rab7gvtawnjmp2ofm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201024100150/https://zenodo.org/record/4107117/files/DHN2020-Book-of-Abstracts.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/75/c6/75c6a13d3cc3b9bf900bfc67a3e272073b5f7815.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.4107117"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> zenodo.org </button> </a>

Open Source Handwritten Text Recognition on Medieval Manuscripts using Mixed Models and Document-Specific Finetuning [article]

Christian Reul, Stefan Tomasek, Florian Langhanki, Uwe Springmann
<span title="2022-01-01">2022</span>
This paper deals with the task of practical and open source Handwritten Text Recognition (HTR) on German medieval manuscripts.  ...  To train the mixed models we collected a corpus of 35 manuscripts and ca. 12.5k text lines for two widely used handwriting styles, Gothic and Bastarda cursives.  ...  as well as Maximilian Nöth and Maximilian Wehner for supporting the data preparation.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.48550/arxiv.2201.07661">doi:10.48550/arxiv.2201.07661</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6maag6ppdnhsnmmbpmgfo76qfe">fatcat:6maag6ppdnhsnmmbpmgfo76qfe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220127170803/https://arxiv.org/pdf/2201.07661.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/eb/63/eb6303d6c78c3b4dab036bf0d4ba3e0f229462c7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.48550/arxiv.2201.07661"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>