A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Corpus-Based Arabic Stemming Using N-Grams
[chapter]
2010
Lecture Notes in Computer Science
In languages with high word inflation such as Arabic, stemming improves text retrieval performance by reducing words variants. We propose a change in the corpus-based stemming approach proposed by Xu and Croft for English and Spanish languages in order to stem Arabic words. We generate the conflation classes by clustering 3-gram representations of the words found in only 10% of the data in the first stage. In the second stage, these clusters are refined using different similarity measures and
doi:10.1007/978-3-642-17187-1_27
fatcat:kpovw73hcjhltnnxstc4bwxwky