Arabic Text Categorization Using Mixed Words

Mahmoud Hussein, Hamdy M. Mousa, Rouhia M. Sallam
2016 International Journal of Information Technology and Computer Science  
There is a tremendous number of Arabic text documents available online that is growing every day. Thus, categorizing these documents becomes very important. In this paper, an approach is proposed to enhance the accuracy of the Arabic text categorization. It is based on a new features representation technique that uses a mixture of a bag of words (BOW) and two adjacent words with different proportions. It also introduces a new features selection technique depends on Term Frequency (TF) and uses
more » ... ency (TF) and uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Experiments are performed without both of normalization and stemming, with one of them, and with both of them. In addition, three data sets of different categories have been collected from online Arabic documents for evaluating the proposed approach. The highest accuracy obtained is 98.61% by the use of normalization.
doi:10.5815/ijitcs.2016.11.09 fatcat:6afnmfbfgnfdhjyjo4mzyfpk24