Arabic Handwritten Digit Recognition Based on Restricted Boltzmann Machine and Convolutional Neural Networks

Ali Alani
2017 Information  
Handwritten digit recognition is an open problem in computer vision and pattern recognition, and solving this problem has elicited increasing interest. The main challenge of this problem is the design of an efficient method that can recognize the handwritten digits that are submitted by the user via digital devices. Numerous studies have been proposed in the past and in recent years to improve handwritten digit recognition in various languages. Research on handwritten digit recognition in
more » ... ecognition in Arabic is limited. At present, deep learning algorithms are extremely popular in computer vision and are used to solve and address important problems, such as image classification, natural language processing, and speech recognition, to provide computers with sensory capabilities that reach the ability of humans. In this study, we propose a new approach for Arabic handwritten digit recognition by use of restricted Boltzmann machine (RBM) and convolutional neural network (CNN) deep learning algorithms. In particular, we propose an Arabic handwritten digit recognition approach that works in two phases. First, we use the RBM, which is a deep learning technique that can extract highly useful features from raw data, and which has been utilized in several classification problems as a feature extraction technique in the feature extraction phase. Then, the extracted features are fed to an efficient CNN architecture with a deep supervised learning architecture for the training and testing process. In the experiment, we used the CMATERDB 3.3.1 Arabic handwritten digit dataset for training and testing the proposed method. Experimental results show that the proposed method significantly improves the accuracy rate, with accuracy reaching 98.59%. Finally, comparison of our results with those of other studies on the CMATERDB 3.3.1 Arabic handwritten digit dataset shows that our approach achieves the highest accuracy rate. 2 of 13 interest [1]. However, most research has been focused on English digits related to the English language and some other European languages; apparently, English handwriting datasets are widely available, and significant results have been achieved [2, 7] . By contrast, little work has been proposed for Arabic handwriting digit recognition due to the complexity of the Arabic language and the lack of public Arabic handwriting digit datasets. Arabic handwritten digit recognition suffers from many challenges, such as writing style, size, shape, and slant variations, as well as image noise, thereby leading to changes in numeral topology [8] .To address these challenges, we consider a solution that focuses on the design of an efficient method that can recognize Arabic handwritten digits that are submitted by users via digital devices. Three main techniques-namely, preprocessing, feature extraction, and classification [7]-are usually used to design an efficient method in pattern recognition. Preprocessing is used to enhance data quality and extract the relevant textual parts and prepare for the recognition process. The main objectives of preprocessing are dimensional reduction, feature extraction, and compression in the amount of information to be retained, among others [9] . The output of the preprocessing produces clean data that can be used directly and efficiently in the feature extraction stage. Meanwhile, feature extraction is the main key factor that affects the success of any recognition method. However, traditional hand-designed feature extraction techniques are tedious and time consuming, and cannot process raw images, in comparison to automatic feature extraction methods by which useful features can be retrieved directly from images. Szarvas, et al. [10] showed that the CNN-SVM combination exhibits good performance in pedestrian detection by use of the automatically optimized features learned by the CNN. Mori et al. [11] used the time domain encoding schemes by modules with different parts of images to train the convolutional spiking NN. In their method, the output of each layer is fed as features to the SVM and 100% face recognition accuracy is obtained on the 600 images of 20 people. Furthermore, the authors in [12] presented an automatic feature extraction method based on CNN. By using the trainable feature extractor plus affine distortions and elastic distortions, the proposed method obtains low error rates of 0.54% and 0.56% for the handwritten digit recognition problem. Therefore, the feature extraction techniques consider the most important steps to increase classification performance; several feature extraction methods are available in [13] [14] [15] [16] [17] [18] . The final step in handwritten digit recognition application is image classification, which is a branch of computer vision, and has been extensively applied in many real-world contexts, such as handwriting image classification [1, 19] , facial recognition [20], remote sensing [21] , and hyperspectral image [22] . Image classification aims to classify sets of images into specified classes. Two types of classification methods in computer vision-namely, the appearance-based method and the feature-based method-are used to classify images. The most commonly used method in literature is the feature-based method, which extracts features from the images and then uses these features directly to improve the classification results [23] . In recent years, finding an effective algorithm for feature extraction has become an important issue in object recognition and image classification. Recent developments in graphic processing unit (GPU) technology and artificial intelligence, such as deep learning algorithms, present promising results in image classification and feature extraction. Therefore, in this study, we emphasize the use of deep learning algorithms for the handwritten digit recognition context. Deep learning algorithms comprise a subset of machine learning techniques that use multiple levels of distributed representations to learn high-level abstractions in data. At present, numerous traditional artificial intelligence problems, such as semantic parsing, transfer learning, and natural language processing [2,5,24], have been solved using deep learning algorithm techniques. The main properties of deep learning methods are that they learn the effect and perform high-level feature extraction by use of the deep architectures in an unsupervised manner without considering the label data [25] . To achieve this goal, layers of network are arranged hierarchically to form a deep architecture. Each layer in the network learns a new representation from its previous layer with the goal of modeling different explanatory factors of variation behind the data [26] .
doi:10.3390/info8040142 fatcat:plgkosww3jfzdoukams372yhdy