Filters








314 Hits in 7.3 sec

OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion [article]

Youssef Bassil, Mohammad Alwani
2012 arXiv   pre-print
This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors.  ...  The proposed algorithm is based on Google's online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient  ...  ACKNOWLEDGMENTS This research was funded by the Lebanese Association for Computational Sciences (LACSC), Beirut, Lebanon under the "Web-Scale OCR Research Project -WSORP2011".  ... 
arXiv:1204.0191v1 fatcat:azx563fb4fhsliys7cvuw6jbay

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set [article]

Youssef Bassil, Mohammad Alwani
2012 arXiv   pre-print
This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors.  ...  The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text.  ...  Acknowledgment This research was funded by the Lebanese Association for Computational Sciences (LACSC), Beirut, Lebanon under the "Web-Scale OCR Research Project -WSORP2011".  ... 
arXiv:1204.0188v1 fatcat:rfudthyyk5hujiromjlru5hu5e

Utilizing web data in identification and correction of OCR errors

Kazem Taghva, Shivam Agarwal, Bertrand Coüasnon, Eric K. Ringger
2013 Document Recognition and Retrieval XXI  
In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors.  ...  We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate.  ...  There is a current research on a new post-processing method and algorithm for OCR error correction, based on huge database of Google's online web search engine.  ... 
doi:10.1117/12.2042403 dblp:conf/drr/TaghvaA14 fatcat:fynbvw6i4nf2joowpiymtlh3ju

Vartani Spellcheck – Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance [article]

Aditya Pal, Abhijit Mustafi
2020 arXiv   pre-print
Automatic spelling error detection and context-sensitive error correction can be used to improve accuracy by post-processing the text generated by these OCR systems.  ...  A majority of previously developed language models for error correction of Hindi spelling have been context-free.  ...  ACKNOWLEDGMENT We are extremely thankful to Google Colaboratory [24] and their powerful hardware which was used to run our BERT models.  ... 
arXiv:2012.07652v1 fatcat:u55lueliknbrhn3m4uvhxrw4qi

Multi-modular domain-tailored OCR post-correction

Sarah Schulz, Jonas Kuhn
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
Texts have to be digitized in an expensive and time consuming process whereas Optical Character Recognition (OCR) post-correction is one of the time-critical factors.  ...  At the example of OCR post-correction, we show the adaptation of a generic system to solve a specific problem with little data.  ...  Bassil and Alwani (2012) use Google's online spelling suggestions for as they draw on a huge lexicon based on contents gathered from all over the web.  ... 
doi:10.18653/v1/d17-1288 dblp:conf/emnlp/SchulzK17 fatcat:fzsxrwqa4bgdxizbcwk3hbxxju

Survey of Post-OCR Processing Approaches

Thi-Tuyet-Hai Nguyen, Adam Jatowt, MIickael Coustaty, Antoine Doucet
2021 Zenodo  
We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches.  ...  Additionally, many texts have been already processed by various out-of-date digitisation techniques. As a consequence, digitised texts are noisy and need to be post-corrected.  ...  The local prole provides a ranked list of possible correction suggestions for each OCR token by using the ground truth and historical spelling variations.  ... 
doi:10.5281/zenodo.4640070 fatcat:6jnyehazujadvejgls6vpnu6ta

Survey of Post-OCR Processing Approaches

Thi-Tuyet-Hai Nguyen, Adam Jatowt, MIickael Coustaty, Antoine Doucet
2021 Zenodo  
We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches.  ...  Additionally, many texts have been already processed by various out-of-date digitisation techniques. As a consequence, digitised texts are noisy and need to be post-corrected.  ...  If the query contains errors, the search engine will suggest some replaceable words for misspellings. These suggestions are used as corrections for OCR errors.  ... 
doi:10.5281/zenodo.4635569 fatcat:x5qoluap7rgyxakv5lm5qcysya

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

2022 International Journal of Information Retrieval Research  
Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose.  ...  This is followed by text extraction with the help of OCR engines.  ...  Following this, we have designed post processing steps to identify the errors and suggest suitable alternatives.  ... 
doi:10.4018/ijirr.289570 fatcat:zjmtlsoxzveuxfn5cw2dzv6wka

Corpus-based technique for improving Arabic OCR system

Ahmed Hussain Aliwy, Basheer Al-Sadawi
2021 Indonesian Journal of Electrical Engineering and Computer Science  
OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors.  ...  This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence.  ...  Doush and Al-Trad (2016) [16] Developed AOCR post-processing system based three strategies Google online suggestion system, Ayaspell spell checker with Google online suggestion system and Microsoft Office  ... 
doi:10.11591/ijeecs.v21.i1.pp233-241 fatcat:qhvklmpwzveddatdyw5w5tv6w4

A Rule-Based Post-Processing Approach to Improve Persian OCR Performance

Zohreh Khosrobeygi, Hadi Veisi, Sayyed Hamid Reza Ahmadi, Hanieh Shabanian
2020 Scientia Iranica. International Journal of Science and Technology  
This paper proposes a Persian OCR post-processing technique to increase the accuracy of the OCR systems dealing with real-world challenging samples.  ...  The accuracy of Khana and Bina in images with a complex-structure is 39% and 58%, respectively, while after applying the proposed post-processing method, the accuracy increases to 93% and 91%, respectively  ...  [24] , used the Google online spelling suggestion to get common spelling suggestions on the English and Arabic languages.  ... 
doi:10.24200/sci.2020.53435.3267 fatcat:nqgv6grl7bcchapr5pdgrs3tvq

A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check

Dingmin Wang, Yan Song, Jing Li, Jialong Han, Haisong Zhang
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing (NLP) applications, but also facilitates reading and understanding  ...  In this paper, we propose a novel approach of constructing CSC corpus with automatically generated spelling errors, which are either visually or phonologically resembled characters, corresponding to the  ...  Acknowledgements The authors want to express special thanks to Xixin Wu for his suggestions and help in the experiment of ASR.  ... 
doi:10.18653/v1/d18-1273 dblp:conf/emnlp/WangSLHZ18 fatcat:2oqyi2fleff5lftjig4xvpedgy

On OCR ground truths and OCR post-correction gold standards, tools and formats

Martin Reynaert
2014 Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14  
Connecting the dots, we discuss the difference we perceive between OCR ground truths and OCR post-correction gold standards and their respective contributions.  ...  We give an overview of activities undertaken in the sidelines of our automatic OCR post-correction core business over the past few years.  ...  processing system' TICCLops 7 online.  ... 
doi:10.1145/2595188.2595216 dblp:conf/datech/Reynaert14 fatcat:ctwkfw2jxbepdcqsendjy7a3ya

Image-based mobile service: automatic text extraction and translation

Jérôme Berclaz, Nina Bhatti, Steven J. Simske, John C. Schettino, Reiner Creutzburg, David Akopian
2010 Multimedia on Mobile Devices 2010  
The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result.  ...  For a definition of cloud computing, see http://en.wikipedia.org/wiki/Cloud_computing As illustrated on Fig. 1 , the translation process is initiated by a user sending a picture by MMS (a).  ...  The usage of a spell correction algorithm specifically designed for post-processing OCR output is likely to be more powerful than the generic spell checker we use in our prototype.  ... 
doi:10.1117/12.840279 fatcat:nko3t7gvrndbzja7o66r2cbqbi

A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People

Aishwarya Karnawat
2018 International Journal for Research in Applied Science and Engineering Technology  
Various tools, algorithms and implement-tations are obtainable to discover characters from pictures.  ...  This survey paper presents a quick summary on favoured OCR techniques like matrix matching, feature extraction; neural network primarily based OCR and discusses OCR software system Tesseract.  ...  Post-processing Error correction, descriptive linguistics correction, spell check etc. are exhausted this step. II. HISTORY 1) 1870s -C. R.  ... 
doi:10.22214/ijraset.2018.1216 fatcat:zk7qkt5xnrbulagi6aoj3c46hm

Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use

Leanne Olson, Veronica Berry
2021 Code4Lib Journal  
The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing.  ...  This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software.  ...  Evaluating the accuracy of OCR for a project in advance also helps to estimate the time needed to correct errors after OCR (Clausner 2020).  ... 
doaj:9fa148ca8abb4fb191d95a14866d52e2 fatcat:gboglyyzuvey3lt5accs54t43q
« Previous Showing results 1 — 15 out of 314 results