A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion
[article]
2012
arXiv
pre-print
This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors. ...
The proposed algorithm is based on Google's online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient ...
ACKNOWLEDGMENTS This research was funded by the Lebanese Association for Computational Sciences (LACSC), Beirut, Lebanon under the "Web-Scale OCR Research Project -WSORP2011". ...
arXiv:1204.0191v1
fatcat:azx563fb4fhsliys7cvuw6jbay
OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set
[article]
2012
arXiv
pre-print
This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. ...
The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. ...
Acknowledgment This research was funded by the Lebanese Association for Computational Sciences (LACSC), Beirut, Lebanon under the "Web-Scale OCR Research Project -WSORP2011". ...
arXiv:1204.0188v1
fatcat:rfudthyyk5hujiromjlru5hu5e
Utilizing web data in identification and correction of OCR errors
2013
Document Recognition and Retrieval XXI
In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors. ...
We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate. ...
There is a current research on a new post-processing method and algorithm for OCR error correction, based on huge database of Google's online web search engine. ...
doi:10.1117/12.2042403
dblp:conf/drr/TaghvaA14
fatcat:fynbvw6i4nf2joowpiymtlh3ju
Vartani Spellcheck – Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance
[article]
2020
arXiv
pre-print
Automatic spelling error detection and context-sensitive error correction can be used to improve accuracy by post-processing the text generated by these OCR systems. ...
A majority of previously developed language models for error correction of Hindi spelling have been context-free. ...
ACKNOWLEDGMENT We are extremely thankful to Google Colaboratory [24] and their powerful hardware which was used to run our BERT models. ...
arXiv:2012.07652v1
fatcat:u55lueliknbrhn3m4uvhxrw4qi
Multi-modular domain-tailored OCR post-correction
2017
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Texts have to be digitized in an expensive and time consuming process whereas Optical Character Recognition (OCR) post-correction is one of the time-critical factors. ...
At the example of OCR post-correction, we show the adaptation of a generic system to solve a specific problem with little data. ...
Bassil and Alwani (2012) use Google's online spelling suggestions for as they draw on a huge lexicon based on contents gathered from all over the web. ...
doi:10.18653/v1/d17-1288
dblp:conf/emnlp/SchulzK17
fatcat:fzsxrwqa4bgdxizbcwk3hbxxju
Survey of Post-OCR Processing Approaches
2021
Zenodo
We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches. ...
Additionally, many texts have been already processed by various out-of-date digitisation techniques. As a consequence, digitised texts are noisy and need to be post-corrected. ...
The local prole provides a ranked list of possible correction suggestions for each OCR token by using the ground truth and historical spelling variations. ...
doi:10.5281/zenodo.4640070
fatcat:6jnyehazujadvejgls6vpnu6ta
Survey of Post-OCR Processing Approaches
2021
Zenodo
We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches. ...
Additionally, many texts have been already processed by various out-of-date digitisation techniques. As a consequence, digitised texts are noisy and need to be post-corrected. ...
If the query contains errors, the search engine will suggest some replaceable words for misspellings. These suggestions are used as corrections for OCR errors. ...
doi:10.5281/zenodo.4635569
fatcat:x5qoluap7rgyxakv5lm5qcysya
An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook
2022
International Journal of Information Retrieval Research
Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose. ...
This is followed by text extraction with the help of OCR engines. ...
Following this, we have designed post processing steps to identify the errors and suggest suitable alternatives. ...
doi:10.4018/ijirr.289570
fatcat:zjmtlsoxzveuxfn5cw2dzv6wka
Corpus-based technique for improving Arabic OCR system
2021
Indonesian Journal of Electrical Engineering and Computer Science
OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. ...
This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. ...
Doush and Al-Trad (2016) [16] Developed AOCR post-processing system based three strategies Google online suggestion system, Ayaspell spell checker with Google online suggestion system and Microsoft Office ...
doi:10.11591/ijeecs.v21.i1.pp233-241
fatcat:qhvklmpwzveddatdyw5w5tv6w4
A Rule-Based Post-Processing Approach to Improve Persian OCR Performance
2020
Scientia Iranica. International Journal of Science and Technology
This paper proposes a Persian OCR post-processing technique to increase the accuracy of the OCR systems dealing with real-world challenging samples. ...
The accuracy of Khana and Bina in images with a complex-structure is 39% and 58%, respectively, while after applying the proposed post-processing method, the accuracy increases to 93% and 91%, respectively ...
[24] , used the Google online spelling suggestion to get common spelling suggestions on the English and Arabic languages. ...
doi:10.24200/sci.2020.53435.3267
fatcat:nqgv6grl7bcchapr5pdgrs3tvq
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing (NLP) applications, but also facilitates reading and understanding ...
In this paper, we propose a novel approach of constructing CSC corpus with automatically generated spelling errors, which are either visually or phonologically resembled characters, corresponding to the ...
Acknowledgements The authors want to express special thanks to Xixin Wu for his suggestions and help in the experiment of ASR. ...
doi:10.18653/v1/d18-1273
dblp:conf/emnlp/WangSLHZ18
fatcat:2oqyi2fleff5lftjig4xvpedgy
On OCR ground truths and OCR post-correction gold standards, tools and formats
2014
Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14
Connecting the dots, we discuss the difference we perceive between OCR ground truths and OCR post-correction gold standards and their respective contributions. ...
We give an overview of activities undertaken in the sidelines of our automatic OCR post-correction core business over the past few years. ...
processing system' TICCLops 7 online. ...
doi:10.1145/2595188.2595216
dblp:conf/datech/Reynaert14
fatcat:ctwkfw2jxbepdcqsendjy7a3ya
Image-based mobile service: automatic text extraction and translation
2010
Multimedia on Mobile Devices 2010
The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. ...
For a definition of cloud computing, see http://en.wikipedia.org/wiki/Cloud_computing As illustrated on Fig. 1 , the translation process is initiated by a user sending a picture by MMS (a). ...
The usage of a spell correction algorithm specifically designed for post-processing OCR output is likely to be more powerful than the generic spell checker we use in our prototype. ...
doi:10.1117/12.840279
fatcat:nko3t7gvrndbzja7o66r2cbqbi
A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People
2018
International Journal for Research in Applied Science and Engineering Technology
Various tools, algorithms and implement-tations are obtainable to discover characters from pictures. ...
This survey paper presents a quick summary on favoured OCR techniques like matrix matching, feature extraction; neural network primarily based OCR and discusses OCR software system Tesseract. ...
Post-processing Error correction, descriptive linguistics correction, spell check etc. are exhausted this step. II. HISTORY 1) 1870s -C. R. ...
doi:10.22214/ijraset.2018.1216
fatcat:zk7qkt5xnrbulagi6aoj3c46hm
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
2021
Code4Lib Journal
The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing. ...
This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. ...
Evaluating the accuracy of OCR for a project in advance also helps to estimate the time needed to correct errors after OCR (Clausner 2020). ...
doaj:9fa148ca8abb4fb191d95a14866d52e2
fatcat:gboglyyzuvey3lt5accs54t43q
« Previous
Showing results 1 — 15 out of 314 results