55 Hits in 2.6 sec

MyOcrTool: Visualization System for Generating Associative Images of Chinese Characters in Smart Devices

Laxmisha Rai, Hong Li, Abd E.I.-Baset Hassanien
2021 Complexity  
The proposed Chinese character recognition system and visualization tool is named as MyOcrTool and developed for Android platform.  ...  The application recognizes the Chinese characters through OCR engine, and uses the internal voice playback interface to realize the audio functions and display the visual images of Chinese characters in  ...  In addition, several works in the past focused on OCR in android applications [35, 36] , real-time OCR [37] , character readability on smart phones [38] , character recognition models suitable for handheld  ... 
doi:10.1155/2021/5583287 fatcat:adgxot2isrdy3ojkokiexwl2bu

Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese [article]

Marek Rychlik, Dwight Nwaigwe and Yan Han and Dylan Murphy
2020 arXiv   pre-print
We also describe approaches geared towards Traditional Chinese, which is non-cursive, but features an extremely large character set of 65,000 characters.  ...  Our methodology is based on Machine Learning, especially Deep Learning, and Data Science, and is directed towards vast quantities of original documents, exceeding a billion pages.  ...  code, if user-trained models are supported (e.g., for Kraken and Tesseract).  ... 
arXiv:2005.08650v1 fatcat:3nmbzaz72vgwnab2ts7iz6ugly

Recognition of Devanagari Scene Text Using Autoencoder CNN

Sankirti Sandeep Shiravale, Jayadevan R, Sanjeev S Sannakki
2021 ELCVIA Electronic Letters on Computer Vision and Image Analysis  
The model is trained with Devanagari scene text images for pixel-wise classification of text and background. The segmented text is then recognized using an existing OCR engine (Tesseract).  ...  A ground-truth dataset containing Devanagari scene text images is prepared for the experimentation. An encoder-decoder convolutional neural network model is used for text/background segmentation.  ...  Considering the complexity of natural scene images and adaptability towards generalising recognition, this paper proposes recognition based on Tesseract OCR engine.  ... 
doi:10.5565/rev/elcvia.1344 fatcat:o4huvndmibeopel3jwhgcc4x7i

A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check

Dingmin Wang, Yan Song, Jing Li, Jialong Han, Haisong Zhang
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
However, to utilize datadriven approaches for CSC, there is one major limitation that annotated corpora are not enough in applying algorithms and building models.  ...  Upon the constructed corpus, different models are trained and evaluated for CSC with respect to three standard test sets.  ...  Besides, the authors would like to thank Li Zhong, Shuming Shi, Garbriel Fung, Kam-Fai Wong, and three anonymous reviewers for their help and insightful comments on various aspects of this work.  ... 
doi:10.18653/v1/d18-1273 dblp:conf/emnlp/WangSLHZ18 fatcat:2oqyi2fleff5lftjig4xvpedgy

Extensible System for Optical Character Recognition of Maintenance Documents

John Anthony Labarga, Amardeep Singh, Vera Zaychik Moffitt
2018 Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM  
This paper describes a flexible system for converting paper forms into digital documents through Optical Character Recognition (OCR), utilizing open source tools and packages.  ...  This system allows for the incorporation of business rules and processes that deliver high fidelity digital copies.  ...  Section 5 describes how to modify Tesseract's character recognition models to adapt it for custom fonts.  ... 
doi:10.36001/phmconf.2018.v10i1.480 fatcat:bribomxkkbfzjnhe4sygmaz65e

A BLSTM Network for Printed Bengali OCR System with High Accuracy [article]

Debabrata Paul, Bidyut Baran Chaudhuri
2019 arXiv   pre-print
For example, Bengali character for 'RA' is sometimes recognized as that of Assamese, mainly in conjunct consonant forms. Our OCR is free from such errors.  ...  It sometimes recognizes a character of Bengali into the same character of a non-Bengali script, especially Assamese, which has no distinction from Bengali, except for a few characters.  ...  [19] have used a hybrid architecture for recognition of printed text lines in degraded documents. Chavan et al.  ... 
arXiv:1908.08674v1 fatcat:oi3tkwqndbgmzewpjufvu42rri

Recognition of Characters from Streaming Videos [chapter]

Arpan Pal, Aniruddha Sinha, Tanushyam Chattopadhyay
2010 Character Recognition  
In addition to describing the OCR theory, it also introduces two popular Open-source OCR technologies available -GOCR (GNU Optical Character Recognition) and Tesseract OCR, and gives a comparative analysis  ...  Optical Character Recognition The pre-processed output is fed to standard OCR engines for recognizing the characters.  ...  The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field.  ... 
doi:10.5772/9777 fatcat:t3b7ldpwdfanpiaad3n6ldcqfe

Analog Document Search Using CRNN and Keyphrase Extraction

Lokeshwar S, Bangalore Institute of Technology (VTU), Bengaluru, Karnataka, India, Vadiraja Rao M. K, Sujay Kumar P. S, Vishveshwara Guthal Gowda, Hemavathi P.
2021 International Journal of Image Graphics and Signal Processing  
With advances in Optical Character Recognition (OCR), Style Transfer Mapping (STM), and efficient key phrasing, we are now able to digitalize the document to a form that can be read across multiple platforms  ...  We propose a system that uses the CRNN model to detect English characters in the document with high accuracy.  ...  Based on this work, we performed image processing using Tesseract before passing it as input to the OCR model.  ... 
doi:10.5815/ijigsp.2021.02.02 fatcat:76vzxjvo2fhsngnbljfgwo7jgi

On- Device Information Extraction from Screenshots in form of tags [article]

Sumit Kumar, Gopi Ramena, Manoj Goyal, Debi Mohanty, Ankur Agarwal, Benu Changmai, Sukumar Moharana
2020 arXiv   pre-print
We developed novel architectures for components in the pipeline, optimized performance and memory for on-device computation.  ...  of screenshots, 2) identified script presentin image, 3) extracted unstructured text from images, 4) identifiedlanguage of the extracted text, 5) extracted keywords from the text, 6) identified tags based  ...  modules as described in Section 3 apart from MLKit and Tesseract OCR. e accuracy of tags is evaluated and shown how tags can be generated automatically based on OCR and image analysis.  ... 
arXiv:2001.06094v1 fatcat:mknzo7ngm5c4vldaico4hqilnm

Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features [article]

Maroua Tounsi, Ikram Moalla, Frank Lebourgeois, Adel M. Alimi
2018 arXiv   pre-print
In this paper, we extended the Bag of Features (BoF)-based model using deep learning for representing features for accurate SCR of different languages.  ...  Our system was evaluated extensively on all the scene character datasets of five different languages. The experimental results proved the efficiency of our system for a multilingual SCR.  ...  Scene character recognition based on BoF It is worth noting that the BoF framework seems to be an efficient model for object recognition.  ... 
arXiv:1806.07374v4 fatcat:edkrvvarazaurok7cql2aot74a

Unconstrained Scene Text and Video Text Recognition for Arabic Script [article]

Mohit Jain, Minesh Mathew, C.V. Jawahar
2017 arXiv   pre-print
Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition.  ...  This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script.  ...  ACKNOWLEDGMENT The authors would like to thank Maaz Anwar, Anjali, Saumya, Vignesh and Rohan for helping annotate the Arabic scenetext dataset and Dr. Girish Varma for his timely help and discussions.  ... 
arXiv:1711.02396v1 fatcat:mv2oibn3vveqpkitrido5qiiki

Label transition and selection pruning and automatic decoding parameter optimization for time-synchronous Viterbi decoding

Yasuhisa Fujii, Dmitriy Genzel, Ashok C. Popat, Remco Teunen
2015 2015 13th International Conference on Document Analysis and Recognition (ICDAR)  
Hidden Markov Model (HMM)-based classifiers have been successfully used for sequential labeling problems such as speech recognition and optical character recognition for decades.  ...  We also propose a novel technique to estimate the parameters based on a loss value without relying on a grid search.  ...  ACKNOWLEDGMENT The authors would like to thank Ray Smith for providing the layout engine.  ... 
doi:10.1109/icdar.2015.7333863 dblp:conf/icdar/FujiiGPT15 fatcat:kmrc2cfo7bfhpajupybvrwcuku

Graph-Based Keyword Spotting in Historical Documents Using Context-Aware Hausdorff Edit Distance

Michael Stauffer, Andreas Fischer, Kaspar Riesen
2018 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)  
ACKNOWLEDGMENT The authors would like to thank the Siemens Postal, Parcel & Airport Logistics GmbH for funding this work.  ...  CONCLUSION This paper has presented a candidate reduction technique based on the Hierarchical Overlapping Clustering for accelerating handwritten Chinese character recognition.  ...  In our previous work [3] , we developed a training data augmentation method and a scene Chinese character recognition method based on the ensemble learning strategy to improve the recognition accuracy  ... 
doi:10.1109/das.2018.31 dblp:conf/das/Stauffer0R18 fatcat:2r2cjpiitfcs5knjtqbfvcuwsi

Text Detection and Recognition in Imagery: A Survey

Qixiang Ye, David Doermann
2015 IEEE Transactions on Pattern Analysis and Machine Intelligence  
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery.  ...  Existing techniques are categorized as either stepwise or integrated and subproblems are highlighted including text localization, verification, segmentation and recognition.  ...  They would also like to thank Tao Wang of Stanford, Kai Wang of UCSD, Chongzhao Shi of Chinese Academy of Sciences, and Chew Lim Tan of the National University of Singapore for providing images.  ... 
doi:10.1109/tpami.2014.2366765 pmid:26352454 fatcat:cuz3qhkglnahdebxqptbsgpjmm

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

Yi-Quan Li, Hao-Sen Chang, Daw-Tung Lin
2022 Applied Sciences  
With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR  ...  In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques.  ...  Traditionally, optical character recognition (OCR) has been used for text recognition and it has achieved good results.  ... 
doi:10.3390/app12020907 fatcat:xzn6warh4raffkfhxeppn7n3im
« Previous Showing results 1 — 15 out of 55 results