DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE

Zahid Hussain, Sajid Iqbal, Tanzila Saba, Abdulaziz Almazyad, Amjad Rehman, Multan Pakistan
2017 unpublished
Stemming reduces numerous variant forms of a word to its base, stem or root form which is essential for different language processing applications including Urdu IR. Urdu is a resource poor and morphologically rich language. Multilingual Urdu vocabulary is very challenging to process due to its complex morphology. Research of Urdu stemming has an age of a decade. However, there has not been any work reported on dictionary based Urdu stemming. The present work introduces a dictionary based Urdu
more » ... tionary based Urdu stemmer with improved performance as compared to the existing Urdu stemmers. The significance of the study is the identification of dictionary-based approach for Urdu stemming as the most promising approach, especially with dictionary update feature. Testing shows 94.85% overall accuracy on test data and results can be further improved by cleaning test data and dictionary updates.
fatcat:6voiefzk4rd7ddookmrhlnfgkm