Filters








46 Hits in 8.6 sec

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning [article]

Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
2018 arXiv   pre-print
Evaluations on six early printed books yielded the following results: On average the combination of pretraining and voting improved the character accuracy by 46% when training five folds starting from  ...  We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models  ...  (Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks; submitted to this issue.).  ... 
arXiv:1802.10038v2 fatcat:oudtxdhjsjan3ijp2ykw4dsgbi

Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks [article]

Christoph Wick, Christian Reul, Frank Puppe
2018 arXiv   pre-print
This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books.  ...  Hereby, the error is reduced by a factor of up to 44 voting mechanism to achieve character error rates (CER) below 0.5 runtime of the deep model for training and prediction of a book behaves very similar  ...  For further improvements, we propose transfer learning by using a pretrained model as initial instance and a following finetuning on a specific book.  ... 
arXiv:1802.10033v1 fatcat:terzx26rqjf3dg57663szuu5uy

OCR4all – An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings [article]

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
2019 arXiv   pre-print
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography.  ...  In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow.  ...  Furthermore, we would like to thank the Opera Camerarii team around Thomas Baier, Marion Gindhart, Joachim Hamm, and Ulrich Schlegelmilch for providing a valuable and challenging use case and test object  ... 
arXiv:1909.04032v1 fatcat:czzg6o6i5baxdcnsc2cacm5xmy

OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
2019 Applied Sciences  
Furthermore, on very complex early printed books, even users with minimal or no experience were able to capture the text with manageable effort and great quality, achieving excellent Character Error Rates  ...  pretrained mixed OCR models are available.  ...  Calamari's training and recognition capabilities combined with the easy-to-use ITA provided by OCR4all allow the users to utilize state-of-the-art deep learning software and accuracy improving techniques  ... 
doi:10.3390/app9224853 fatcat:3dd7pnyblrdq3e4lsjlodkd52y

Optical character recognition with neural networks and post-correction with finite state methods

Senka Drobac, Krister Lindén
2020 International Journal on Document Analysis and Recognition  
Furthermore, we revisit the effect of confidence voting on the OCR results with different model combinations. Finally, we perform post-correction on the new OCR results and perform error analysis.  ...  The greatest accomplishment of the study is the successful training of one mixed language model for the entire corpus and finding a voting setup that further improves the results.  ...  If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly  ... 
doi:10.1007/s10032-020-00359-9 fatcat:cjonawcrebec7n34iapdg7frqu

Deep Learning for Historical Document Analysis and Recognition—A Survey

Francesco Lombardi, Simone Marinai
2020 Journal of Imaging  
Guided by these tasks, we go through the different input-output relations that are expected from the used deep learning approaches and therefore we accordingly describe the most used models.  ...  Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception.  ...  After Gutenberg introduction in Europe of movable printing, early printed books became more and more popular [21] .  ... 
doi:10.3390/jimaging6100110 pmid:34460551 pmcid:PMC8321201 fatcat:nevh2ctshzfwtey4girgjtaftq

Named Entity Recognition and Classification on Historical Documents: A Survey [article]

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet
2021 arXiv   pre-print
In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future  ...  While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop  ...  ACKNOWLEDGMENTS The work of Maud Ehrmann and Matteo Romanello was supported by the Swiss National Science Foundation under the grants number CR-SII5_173719 (Impresso -Media Monitoring of the Past) and  ... 
arXiv:2109.11406v1 fatcat:zbwoybklk5bjrlf2b67qm6t7e4

Historical Document Image Binarization: A Review

Chris Tensmeyer, Tony Martinez
2020 SN Computer Science  
Besides the standard methods for image thresholding, preprocessing, and post-processing, we review the literature on methods such as statistical models, pixel classification with learning algorithms, and  ...  This review provides a comprehensive view of the field of historical document image binarization with a focus on the contributions made in the last decade.  ...  The pretraining greatly improves model performance to achieve state-of-the-art results on DIBCO17.  ... 
doi:10.1007/s42979-020-00176-1 fatcat:vgn3kw3asjewjnjpfkwnnudry4

Capitalization and punctuation restoration: a survey

Vasile Păiş, Dan Tufiş
2021 Artificial Intelligence Review  
Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing.  ...  This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing.  ...  Furthermore, the accuracy of the models can be further increased by using an ensemble mechanism based on a simple voting between the three implemented models.  ... 
doi:10.1007/s10462-021-10051-x fatcat:j4blakzh5rew3iljtytpcnnc4q

Open Source Handwritten Text Recognition on Medieval Manuscripts using Mixed Models and Document-Specific Finetuning [article]

Christian Reul, Stefan Tomasek, Florian Langhanki, Uwe Springmann
2022
After training on 2, 4 and eventually 32 pages the CER dropped to 3.27%, 2.58%, and 1.65%, respectively.  ...  We report on our efforts to construct mixed recognition models which can be applied out-of-the-box without any further document-specific training but also serve as a starting point for finetuning by training  ...  This work was partially funded by the German Research Foundation (DFG) under project no. 460665940.  ... 
doi:10.48550/arxiv.2201.07661 fatcat:6maag6ppdnhsnmmbpmgfo76qfe

Program of the Sixty-Second Annual Convention of the American Psychological Association: Abstracts of Papers

No authorship indicated
1954 American Psychologist  
Selections were made by touch- metal discs with a stylus or by reporting code printed on paper discs. Both methods of re- Ing nse were combined with the square and circular panels.  ...  Half of the Ss in each of the pretraining groups re- learned under stress.  ... 
doi:10.1037/h0058777 fatcat:w4kwdxup55htdomz4vjdfini7m

Women's Work, Men's Work: Sex Segregation on the Job

Mary Ruggie, Barbara F. Reskin, Heidi I. Hartmann
1986 Contemporary Sociology  
Acknowledgments The Determinants and Consequences of Occupational Information for Young Women.  ...  The dramatic effect of the passage and enforcement of the 1965 Voting Rights Act on voting by blacks (U.S.  ...  version for accuracy.  ... 
doi:10.2307/2071043 fatcat:42pjf54yofab5ahcmcsaksjuri

Computing, memory and writing: some reflections on an early experiment in digital literary studies [chapter]

Giorgio Guzzetta, Federico Nanni
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015  
loaded on TLA platform.  ...  Acknowledgments This work has been partially supported by the EC project CogNet, 671625 (H2020-ICT-2014-2, Research and Innovation action) and by an IBM Faculty Award.  ...  Parser combinations, either stacking or voting, can be quite effective in improving accuracy of individual parsers, as proved in the Evalita 2014 shared task and confirmed by our own experiments also on  ... 
doi:10.4000/books.aaccademia.1491 fatcat:xvp7vek3ana23l4di23uayhuom

An Active Learning Approach to the Classification of Non-Sentential Utterances [chapter]

Paolo Dragone, Pierre Lison
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015  
loaded on TLA platform.  ...  Acknowledgments This work has been partially supported by the EC project CogNet, 671625 (H2020-ICT-2014-2, Research and Innovation action) and by an IBM Faculty Award.  ...  Parser combinations, either stacking or voting, can be quite effective in improving accuracy of individual parsers, as proved in the Evalita 2014 shared task and confirmed by our own experiments also on  ... 
doi:10.4000/books.aaccademia.1464 fatcat:vrcz7tugdjcajhrjra7mgej6hi

Enhancing the Accuracy of Ancient Greek WordNet by Multilingual. Distributional Semantics [chapter]

Yuri Bizzoni, Riccardo Del Gratta, Federico Boschetti, Marianne Reboul
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015  
loaded on TLA platform.  ...  Acknowledgments This work has been partially supported by the EC project CogNet, 671625 (H2020-ICT-2014-2, Research and Innovation action) and by an IBM Faculty Award.  ...  Parser combinations, either stacking or voting, can be quite effective in improving accuracy of individual parsers, as proved in the Evalita 2014 shared task and confirmed by our own experiments also on  ... 
doi:10.4000/books.aaccademia.1312 fatcat:oau7essk5bh4xosr6piiiyz6hi
« Previous Showing results 1 — 15 out of 46 results