An Effective Arabic Text Classification Approach Based on Kernel Naive Bayes Classifier

Raed Al-khurayji, Ahmed Sameh
2017 International Journal of Artificial Intelligence & Applications  
With growing texts of electronic documents used in many applications, a fast and accurate text classification method is very important. Arabic text classification is one of the most challenging topics. This is probably caused by the fact that Arabic words have unlimited variation in the meaning, in addition to the problems that are specific to Arabic language only. Many studies have been proved that Naive Bayes (NB) classifier is being relatively robust, easy to implement, fast, and accurate
more » ... many different fields such as text classification. However, non-linear classification and strong violations of the independence assumptions problems can lead to very poor performance of NB classifier. In this paper, first, we preprocess the Arabic documents to tokenize only the Arabic words. Second, we convert those words into vectors using term frequency and inverse document frequency (TF-IDF) technique. Third, we propose an efficient approach based on Kernel Naive Bayes (KNB) classifier to solve the non-linearity problem of Arabic text classification. Finally, experimental results and performance evaluation on our collected dataset of Arabic topic mining corpus are presented, showing the effectiveness of the proposed KNB classifier against other baseline classifiers.
doi:10.5121/ijaia.2017.8601 fatcat:45pyue5d3zfhdo5mv7lmjrls7i