Towards an error-free Arabic stemming

Eiman Tamah Al-Shammari, Jessica Lin
2008 Proceeding of the 2nd ACM workshop on Improving non english web searching - iNEWS '08  
Stemming is a computational process for reducing words to their roots (or stems). It can be classified as a recall-enhancing or precision-enhancing component. Existing Arabic stemmers suffer from high stemming error-rates. Arabic stemmers blindly stem all the words and perform poorly especially with compound words, nouns and foreign Arabized words. The Educated Text Stemmer (ETS) is presented in this paper. ETS is a dictionary free, simple, and highly effective Arabic stemming algorithm that
more » ... reduce stemming errors in addition to decreasing computational time and data storage. The novelty of the work arises from the use of neglected Arabic stop-words. These stop-words can be highly important and can provide a significant improvement to processing Arabic documents. The ETS stemmer is evaluated by comparison with output from human generated stemming and the stemming weight technique.
doi:10.1145/1460027.1460030 dblp:conf/cikm/Al-ShammariL08 fatcat:wol556egtzdhtc2zjpn3fkioda