Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Hassanin M. Al-Barhamtoshy, Kamal M. Jambi, Sherif M. Abdou, Mohsen A. Rashwan
2021 IEEE Access  
This paper presents a new computational backend model that support Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose large number of other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods
more » ... document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 3000DPI resolution and is segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a "service description", which include an interface and a server's URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared with respect to corrected equivalent.
doi:10.1109/access.2021.3066477 fatcat:qiuq5kj6vbfuhge26psnsk25ui