Arabic Document Classification Using Multiword Features

Diab Abuaiadah
2013 International Journal of Computer and Communication Engineering  
Weinvestigate the use of multiword features to improve Arabic document classification. The Arabic language is both morphologically rich and highly inflected. Accordingly it presents more challenges when enhancing Arabic information retrieval to a level comparable to English. The multiword features are modeled as a combination of words appearing within windows of varying sizes. Our experiments show multiword features combined with dice similarity distance outperform the cosine similarity
more » ... and produce results that are comparable to TF-IDF representation. Multiword features are under-explored and we believe they have the potential to improve Arabic information retrieval and, in particular, Arabic document classification.
doi:10.7763/ijcce.2013.v2.269 fatcat:bmtidmfzerdipeddjlc5ipfznu