Filters








612 Hits in 3.1 sec

A Word & Character N-Gram based Arabic OCR Error Simulation model

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith
2013 INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY  
The proposed model based onsimulating the Arabic OCR recognition mistakesbased on both, word based and Character N-Gram approaches. Then we expand the user search query using the expected OCR errors.  ...  The retrieval effectiveness of the newmodel is %93, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56.  ...  and a set of experiments that were designed to identify the effect of the proposed model on retrieval effectiveness.  ... 
doi:10.24297/ijct.v12i8.2999 fatcat:b3fk3peepfhe5fjuaxfldn25hy

OCR Error Correction Using Character Correction and Feature-Based Word Classification [article]

Ido Kissos, Nachum Dershowitz
2016 arXiv   pre-print
, the most frequent types of error on our dataset.  ...  This paper explores the use of a learned classifier for post-OCR text correction.  ...  This paper is organized as follows: Section 2 provides background information on Arabic OCR and OCR error correction. Section 3 presents the error correction methodology.  ... 
arXiv:1604.06225v1 fatcat:5wunhbghdbcpbdhvlaildhzg7y

Term selection for searching printed Arabic

Kareem Darwish, Douglas W. Oard
2002 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02  
Since many Arabic documents are available only in print, automating retrieval from collections of scanned Arabic document images using Optical Character Recognition (OCR) is an interesting problem.  ...  Character n-grams or lightly stemmed words were found to typically yield near-optimal retrieval effectiveness, and combining both types of terms resulted in robust performance across a broad range of conditions  ...  In section 5, we explore the effect of OCR-degraded Arabic text on retrieval effectiveness.  ... 
doi:10.1145/564422.564423 fatcat:5e6l476wzndwtbkq2wktexcwsy

Term selection for searching printed Arabic

Kareem Darwish, Douglas W. Oard
2002 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02  
Since many Arabic documents are available only in print, automating retrieval from collections of scanned Arabic document images using Optical Character Recognition (OCR) is an interesting problem.  ...  Character n-grams or lightly stemmed words were found to typically yield near-optimal retrieval effectiveness, and combining both types of terms resulted in robust performance across a broad range of conditions  ...  In section 5, we explore the effect of OCR-degraded Arabic text on retrieval effectiveness.  ... 
doi:10.1145/564376.564423 dblp:conf/sigir/DarwishO02 fatcat:6kwqwvac2zhp3nntfxa3o76qgy

Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Hassanin M. Al-Barhamtoshy, Kamal M. Jambi, Sherif M. Abdou, Mohsen A. Rashwan
2021 IEEE Access  
Consequently, ADIR services provide general functions of the Arabic OCR to compose large number of other services in the OCR domain.  ...  This paper presents a new computational backend model that support Arabic document information retrieval (ADIR) as a dataset and OCR services.  ...  ACKNOWLEDGEMENT This project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH) -King Abdulaziz City for Science and Technology -the Kingdom of Saudi Arabia-award number  ... 
doi:10.1109/access.2021.3066477 fatcat:qiuq5kj6vbfuhge26psnsk25ui

Probabilistic structured query methods

Kareem Darwish, Douglas W. Oard
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Statistically significant improvements in retrieval effectiveness are demonstrated for cross-language retrieval and for retrieval based on optical character recognition when replacement probabilities are  ...  Structured methods for query term replacement rely on separate estimates of term frequency and document frequency to compute the weight for each query term.  ...  Results OCR-BASED RETRIEVAL Previous approaches to retrieval of OCR-degraded text have focused primarily on correcting OCR errors [7] [15] or on fuzzy matching techniques that are less sensitive than  ... 
doi:10.1145/860435.860497 dblp:conf/sigir/DarwishO03 fatcat:zfimlam5efdazkhuby5eiq64ou

Probabilistic structured query methods

Kareem Darwish, Douglas W. Oard
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
Statistically significant improvements in retrieval effectiveness are demonstrated for cross-language retrieval and for retrieval based on optical character recognition when replacement probabilities are  ...  Structured methods for query term replacement rely on separate estimates of term frequency and document frequency to compute the weight for each query term.  ...  Results OCR-BASED RETRIEVAL Previous approaches to retrieval of OCR-degraded text have focused primarily on correcting OCR errors [7] [15] or on fuzzy matching techniques that are less sensitive than  ... 
doi:10.1145/860496.860497 fatcat:sikggoyh4rab7lvcvbaz6dnq4i

Information retrieval and OCR

Jamie Callan, Paul Kantor, David Grossman
2002 SIGIR Forum  
Acknowledgements We thank the organizers of the SIGIR conference for their excellent support of the workshop, and the Tampere Chamber of Commerce for spectacular weather.  ...  One conclusion is that this approach may be most effective on "difficult" languages such as Hebrew and Arabic.  ...  Adenike Lam-Adesina presented "Examining the Effectiveness of IR Techniques for Document Image Retrieval" which studied the use of automatic relevance feedback on OCR documents.  ... 
doi:10.1145/792550.792561 fatcat:j4jgqbvsavcthn6wgmnd46xu6e

Arabic Information Retrieval

Kareem Darwish
2014 Foundations and Trends in Information Retrieval  
Error correction vs. query garbling for Arabic OCR document retrieval. In ACM Transactions on Information Systems (TOIS), Vol. 26. [54] Kareem Darwish. 2013.  ...  The effect of blind relevance feedback on a new Arabic OCR degraded text collection. International Conference on Machine Intelligence: Special Session on Arabic Docu- ment Image Analysis.  ... 
doi:10.1561/1500000031 fatcat:2nxjdu43erhdvbs35ykavrk76a

A Review of Arabic Optical Character Recognition Techniques & Performance
English

Yazan M Alwaqfi, Mumtazimah Mohamad
2020 International Journal of Engineering Trends and Technoloy  
The characteristics of Arabic text cause more errors than in English text in OCR. The aim of this paper is to analyze the related works and issues in Arabic language OCRs.  ...  In addition, the review of deep learning for Arabic OCR systems and researches is very important and useful.  ...  ACKNOWLEDGEMENT This work is supported by UniSZA Center of Excellence Management and Research Incubator and University Sultan Zainal Abidin, Terengganu, Malaysia.  ... 
doi:10.14445/22315381/cati1p208 fatcat:ilgf3ea7pfa45e2uo7iv7dszxe

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

Mohamed Ben, Hichem karray, Adel. M., Ana Fernández
2012 International Journal of Advanced Computer Science and Applications  
In this paper we propose a robust approach for text extraction and recognition from video clips which is called Neuro-Fuzzy system for Arabic Video OCR.  ...  The emergence of artificial text is consequently vigilantly directed. This type of text carries with it important information that helps in video referencing, indexing and retrieval.  ...  ACKNOWLEDGMENT The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.  ... 
doi:10.14569/ijacsa.2012.031022 fatcat:exnrssn7orgxpi73pyr275b3ze

Document Analysis Systems for Digital Libraries: Challenges and Opportunities [chapter]

Henry S. Baird, Venugopal Govindaraju, Daniel P. Lopresti
2004 Lecture Notes in Computer Science  
The state-of-the-art is summarized, including a digest of themes that emerged during the recent International Workshop on Document Image Analysis for Libraries.  ...  DL's; (c) the presentation of doc-images to DL users; (d) navigation within and among doc-images in DL's; and (e) effective use of personal and interactive DL's.  ...  The excellent survey [20] summarized the state of the art (in 1997) of retrieval of entire multi-page articles as follows: 1. at OCR character error rates below 5%, IR methods suffer little loss of either  ... 
doi:10.1007/978-3-540-28640-0_1 fatcat:3szb2elcm5amvlhvma3kbwmzza

Improving Arabic Instant Machine Translation: The Case of Arabic Triangle of Language

Tala M. Albashir, Hussien Alzoubi, Mohammad Albatainih
2020 Journal of Computer Science  
Character Recognition (OCR).  ...  Recently, instant translator applications would be a very useful applications when traveling especially when one knows little about the language of the country she/he is traveling to.  ...  The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.  ... 
doi:10.3844/jcssp.2020.956.965 fatcat:hfvm75z5wvaqbpcagrkpdrtyjy

An automatic linking service of document images reducing the effects of OCR errors with latent semantics

Renato F. Bulcão-Neto, José Camacho-Guerrero, Álvaro Barreiro, Javier Parapar, Alessandra A. Macedo
2010 Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10  
Results show the feasibility of LinkDI relating OCR output with high degradation.  ...  Robust Information Retrieval (IR) systems have been demanded due to the widespread and multipurpose use of document images, and the high number of document images repositories available nowadays.  ...  In recent years, Magdy and Darwish [11] investigated the effect of OCR correction techniques on the effectiveness of retrieving Arabic document images using distinct index terms.  ... 
doi:10.1145/1774088.1774092 dblp:conf/sac/NetoGBPM10 fatcat:xqujwg56qvhgrm7zev6467qfna

Integrated segmentation and recognition of connected Ottoman script

Ismet Zeki Yalniz
2009 Optical Engineering: The Journal of SPIE  
Next, a function is defined for scoring each different syntactically correct sequence of these candidate letters.  ...  In a further set of experiments, we also demonstrate that the framework can be used as a building block for an information retrieval system for digital Ottoman archives.  ...  IR Experiments We analyzed the possible effect of OCR errors on IR performance.  ... 
doi:10.1117/1.3262346 fatcat:v2nmiropqbddbezvtdz3vgzzea
« Previous Showing results 1 — 15 out of 612 results