Filters








1,522 Hits in 5.5 sec

Design and Development of Unsupervised Stemmer for Sindhi Language

Bharti Nathani, Nisheeth Joshi, G.N. Purohit
2020 Procedia Computer Science  
This paper presents a stemmer, design and developed for Sindhi Language, using unsupervised approach. Suffixes are extracted using "Linguistica 5 "[22] a tool for unsupervised learning of morphology.  ...  This paper presents a stemmer, design and developed for Sindhi Language, using unsupervised approach. Suffixes are extracted using "Linguistica 5 "[22] a tool for unsupervised learning of morphology.  ...  Majgaonker [27] design a rule-based stemmer and unsupervised stemmer for Marathi Language and compared the performance on a manually stemmed 1500 words test dataset. Gupta et.al.  ... 
doi:10.1016/j.procs.2020.03.212 fatcat:bs2mggcwh5bz3oeha25lehmu7u

SALMA: Standard Arabic Language Morphological Analysis

M. Sawalha, E. Atwell, M. A. M. Abushariah
2013 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA)  
The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word,  ...  The SALMA-Tools is a collection of open-source standards, tools and resources that widen the scope of Arabic word structure analysisparticularly morphological analysis, to process Arabic text corpora of  ...  (iii) It has been reported as a standard for evaluating morphological analyzers for Arabic text and for building a gold standard for evaluating morphological analyzers and part-of-speech taggers for Arabic  ... 
doi:10.1109/iccspa.2013.6487311 fatcat:zyszkduja5gjlkgsnxknpwq7re

Quality Estimation Of Machine Translation Outputs Through Stemming [article]

Pooja Gupta, Nisheeth Joshi, Iti Mathur
2014 arXiv   pre-print
Every day we can see some machine translators being developed, but getting a high quality automatic translation is still a very distant dream .  ...  In this paper, we are emphasizing on English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system, which employs some machine learning techniques and morphological  ...  [13] proposed A Lightweight Stemmer for Gujarati, they showed an implementation of a rule based stemmer of Gujarati and created rules for stemming and the richness in morphology.  ... 
arXiv:1407.2694v1 fatcat:ukog2eg3w5a4pkt22kd3gyipoe

Quality Estimation of Machine Translation Outputs Through Stemming

Pooja Gupta, Nisheeth Joshi, Iti Mathur
2014 International Journal on Computational Science & Applications  
Every day we can see some machine translators being developed , but getting a high quality automatic translation is still a very distant dream .  ...  In this paper, we are emphasizing on English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system, which employs some machine learning techniques and morphological  ...  [13] proposed A Lightweight Stemmer for Gujarati, they showed an implementation of a rule based stemmer of Gujarati and created rules for stemming and the richness in morphology.  ... 
doi:10.5121/ijcsa.2014.4302 fatcat:2okcf2wv6vdujft3fjhaesbdqa

Parallel hardware for faster morphological analysis

Issam Damaj, Mahmoud Imdoukh, Rached Zantout
2018 Journal of King Saud University: Computer and Information Sciences  
The investigation includes a thorough evaluation of the methodology, and performance and accuracy analyses of the developed software and hardware implementations.  ...  The developed stemmer for verb root extraction with infix processing attained accuracies of 87% and 90.7% for analyzing the texts of the Holy Quran and its Chapter 29 - Surat Al-Ankabut.  ...  A thorough analysis and evaluation is presented in Section 6 including validation and testing, performance analysis, accuracy analysis, and a general evaluation.  ... 
doi:10.1016/j.jksuci.2017.07.003 fatcat:rti3inukvfgzbmo76i3w57z5kq

Influence of GUJarati STEmmeR in Supervised Learning of Web Page Categorization

Chandrakant D. Patel, Research Scholar, Hemchandracharya North Gujarat University, Patan, Gujarat, India, Jayesh M. Patel
2021 International Journal of Intelligent Systems and Applications  
This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in  ...  the corpus as a word by word for the given query.  ...  To evaluate this method, a framework for the Gujarati WPC is developed that implements the general method and provides support for several algorithms that have been considering for study.  ... 
doi:10.5815/ijisa.2021.03.03 fatcat:hylx7xnbufathfn7gqdzxbzlfy

A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm

Naresh Kumar Nagwani, Shrish Verma
2011 International Journal of Computer Applications  
In this paper a frequent term based text summarization algorithm is designed and implemented in java. The designed algorithm works in three steps.  ...  The designed algorithm is implemented using open source technologies like java, DISCO, Porters stemmer etc. and verified over the standard text mining corpus.  ...  Java is a general-purpose, concurrent, class-based, object-oriented language that is specifically designed to have as few implementation dependencies as possible.  ... 
doi:10.5120/2190-2778 fatcat:6hpb3cpnqjh7fcdjnxubzeybka

UniNE at CLEF 2008: TEL, and Persian IR [chapter]

Ljiljana Dolamic, Claire Fautsch, Jacques Savoy
2009 Lecture Notes in Computer Science  
As a second objective we wanted to design and evaluate a stopword list and a light stemming strategy for the Persian (Farsi), a member of the Indo-European family of languages and whose morphology is more  ...  records) and also to evaluate the retrieval effectiveness of several IR models.  ...  Introduction During the last few years, the IR group at University of Neuchatel has focused on designing, implementing and evaluating IR systems for various natural languages, including European [1]  ... 
doi:10.1007/978-3-642-04447-2_22 fatcat:kzkwdal6pfekpfml3to6du4hcq

Towards an error-free Arabic stemming

Eiman Tamah Al-Shammari, Jessica Lin
2008 Proceeding of the 2nd ACM workshop on Improving non english web searching - iNEWS '08  
The ETS stemmer is evaluated by comparison with output from human generated stemming and the stemming weight technique.  ...  The novelty of the work arises from the use of neglected Arabic stop-words. These stop-words can be highly important and can provide a significant improvement to processing Arabic documents.  ...  EVALUATION AND EXPERIMENTS Different criteria are used to evaluate the performance of a stemmer. A good stemmer (by definition) is a stemmer that stems all the words to their correct roots.  ... 
doi:10.1145/1460027.1460030 dblp:conf/cikm/Al-ShammariL08 fatcat:wol556egtzdhtc2zjpn3fkioda

AutoClass: Automatic Text to OOP Concept Identification Model

Fatma Bozyiğit, Özlem Aktaş, Deniz Kılınç
2016 International Journal of Computer Applications  
This paper presents a CASE tool called AutoClass which extracts class diagrams and generates C# source code from the requirement documents.  ...  Natural Language Processing (NLP) techniques and rule-based model are used to implement automatic concept identification model in the study.  ...  In next section, a survey of the related works which implement automatic concept identification is presented.  ... 
doi:10.5120/ijca2016911647 fatcat:5qfrzklysjg7xpt5te2ugxgb4u

A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages

Vishal Gupta, Gurpreet Singh Lehal
2013 Journal of Emerging Technologies in Web Intelligence  
The design of stemmers is language specific, and requires some to significant linguistic expertise in the language, as well as the understanding of the needs for a spelling checker for that language.  ...  In this paper a survey of common stemming techniques and existing stemmers for Indian languages have been presented.  ...  The design of stemmers is language specific, and requires some to significant linguistic expertise in the language, as well as the understanding of the needs for a spelling checker for that language.  ... 
doi:10.4304/jetwi.5.2.157-161 fatcat:5f4y4de4qnasbjxp2xtqbdnqmu

Searching strategies for the Hungarian language

Jacques Savoy
2008 Information Processing & Management  
It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective.  ...  Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.  ...  While stemming schemes are normally designed to work with general texts, some may also be especially designed for a specific domain (e.g., in medicine) or a given document collection, such as that developed  ... 
doi:10.1016/j.ipm.2007.01.022 fatcat:2ffg3z4tpjglxhzapjzfl74qui

Improving a Lightweight Stemmer for Gujarati Language

Chandrakant D, Jayeshkumar M. Patel
2016 International Journal of Information Sciences and Techniques  
Establish a stemmer effective for the language of Gujarati has been always a search domain hot since the Gujarati has a very different structure and difficult that the other language due to the rich morphology  ...  It is usually used in several types of applications such as Natural Language Processing (NLP), Information Retrieval (IR) and Text Mining (TM) including Text Categorization (TC), Text Summarization (TS  ...  We also evaluate new algorithm with IRS with precision and recall, improved. Since implementation of this algorithm also testing using different regional language for further processing.  ... 
doi:10.5121/ijist.2016.6214 fatcat:zfdsa4nw2ndilbxhqerx4sxaiy

Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy

Tarek Kanan, Edward A. Fox
2015 Journal of the Association for Information Science and Technology  
We designed a simple taxonomy for Arabic news stories that is suitable for the needs in Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council  ...  We developed tailored stemming (i.e., a new Arabic light stemmer) and automatic classification methods (the best being binary SVM classifiers) to work with the taxonomy.  ...  Acknowledgments We acknowledge QNRF for their support. This research was made possible by NPRP grant # 4-029-1-007 from the Qatar National Research Fund (a member of Qatar Foundation).  ... 
doi:10.1002/asi.23609 fatcat:lzsmz2t3p5bcpnig4gdgpatkia

An evaluation of conflation accuracy using finite‐state transducers

Carmen Galvez, Félix de Moya‐Anegón
2006 Journal of Documentation  
Design/methodology/approach -Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents.  ...  Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval.  ...  At the same time, stemmers are typically easy to implement, and run fast, yet they do not give a high percentage of accuracy, making them inappropriate for some applications.  ... 
doi:10.1108/00220410610666493 fatcat:rcf2r7vxqbbvlcuyvuscy2wopq
« Previous Showing results 1 — 15 out of 1,522 results