Filters








498 Hits in 4.5 sec

Analysis and development of Urdu POS tagged corpus

Ahmed Muaz, Aasim Ali, Sarmad Hussain
2009 Proceedings of the 7th Workshop on Asian Language Resources - ALR7   unpublished
In this paper, two corpora of Urdu (with 110K and 120K words) tagged with different POS tagsets are used to train TnT and Tree taggers.  ...  The existing tagged corpora are tagged with the new tagset to develop a single corpus of 230K words and the TnT tagger is retrained.  ...  Introduction There is increasing amount of work on computational modeling of Urdu language. As various groups work on the language, diversity in analysis is also developed.  ... 
doi:10.3115/1690299.1690303 fatcat:3slezbijzvcz3gu7cpjncwxtky

Performance Comparison of Bootstrapped Statistical Taggers on Urdu Tweets

Amber Baig, Mutee U Rahman, Sehrish Abrejo, Khalid H Mohamadani, Ahsanullah Baloch
2021 International Journal of Scientific and Research Publications (IJSRP)  
The aim of this study is twofold. First, is to investigate how well the statistical taggers developed for POS tagging of structured text fare in the domain of tweet POS tagging.  ...  To this end, Stanford and MorphoDiTa taggers were trained on 500 Urdu tweet gold-standard corpus and were utilized for semi-automatic corpus annotation in bootstrapped fashion.  ...  All faculty members of Department of Computer Science, Isra University are acknowledged for their help and support throughout the course of this study.  ... 
doi:10.29322/ijsrp.11.07.2021.p11559 fatcat:2awecd6mwzgt3clwkcd6ml65ku

Developing a POS Tagged Corpus of Urdu Tweets

Amber Baig, Mutee U Rahman, Hameedullah Kazi, Ahsanullah Baloch
2020 Computers  
We introduce a new tagset for POS-tagging of Urdu tweets along with the POS-tagged Urdu tweets corpus.  ...  Still, no such attempt has been made to develop a POS tagger for Urdu social media content. Thus, the focus of this paper is on POS tagging of Urdu tweets.  ...  Figure 1 . 1 Corpus Development Process. Figure 1 . 1 Corpus Development Process. Figure 2 . 2 An Example of a Part-of-Speech tagged Urdu Tweet.  ... 
doi:10.3390/computers9040090 fatcat:tqbvexu5h5bkzc6gqb4t7zi2wi

Transtech: development of a novel translator for Roman Urdu to English

Hafsa Masroor, Muhammad Saeed, Maryam Feroz, Kamran Ahsan, Khawar Islam
2019 Heliyon  
Self-maintained corpus alongwith its corresponding tag-set is used for tokenization. The syntactical structure is covered by writing Urdu POS tagger based on grammatical rules.  ...  The objective of this research is to develop and test a novel tactic to solve the issue of translation from Roman Urdu to the English language.  ...  It consists of terminals (POS tags) and non-terminals, which generates a set of production rules.  ... 
doi:10.1016/j.heliyon.2019.e01780 pmid:31193721 pmcid:PMC6538981 fatcat:aqcvtgoiobcuzhfvrwwdsztcnq

Development of Saraiki WordNet by Mapping of Word Senses: A Corpus-based Approach

Sarah Gul, Musarrat Azher, Sana Nawaz
2021 Linguistics and Literature Review  
This paper aimed to develop the Saraiki WordNet. Saraiki is one of the regional languages spoken in Pakistan and has a unique history of its own.  ...  This study may aid in creating bilingual dictionaries (of Saraiki and Urdu?) in the future. Keywords: expand approach, mapping, Saraiki language, WordNet  ...  These POS tag sets helped in tagging data and categorized them in proper grammatical categories. POS tag sets can be developed from scratch or they can be downloaded as well.  ... 
doi:10.32350/llr.72/04 fatcat:7bvuzavwdbb7dpf6p53qyldeby

Building a Hierarchical Annotated Corpus of Urdu: The URDU.KON-TB Treebank [chapter]

Qaiser Abbas
2012 Lecture Notes in Computer Science  
Part of speech (POS) tagging and annotation of a selected set of sentences from different sub-domains of this corpus is in process manually and the work performed till to date is presented here.  ...  Urdu is a comparatively under resourced language and the development of a reliable treebank for Urdu will have significant impact on the state-of-the-art for Urdu language processing.  ...  Miriam Butt, University of Konstanz for her encouragement, guidance and support.  ... 
doi:10.1007/978-3-642-28604-9_6 fatcat:o4fbizkjqngstj42yshpvlrnoq

A survey on sentiment analysis in Urdu: A resource-poor language

Asad Khattak, Muhammad Zubair Asghar, Anam Saeed, Ibrahim A. Hameed, Syed Asif Hassan, Shakeel Ahmad
2020 Egyptian Informatics Journal  
The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications.  ...  An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations  ...  This Research work was supported by Zayed University Research Incentives Fund#R18052, co-funded by Norwegian university of science and technology, Ålesund, Norway.  ... 
doi:10.1016/j.eij.2020.04.003 fatcat:qvymechpvnhypg2telxbs4wj4m

Estudio basado en corpus sobre el perfil de vocabulario del lenguaje Shahmukhi Punjabi

Muhammad Farukh Arslan, Muhammad Asim Mehmood, Shaukat Hayat.
2019 Dilemas Contemporáneos: Educación, Política y Valores  
Se ha observado que las palabras del idioma Punjabi tienen muchos casos y formas diferentes como contrarias al idioma inglés y similares al idioma Urdu.  ...  Un corpus de Shahmukhi Punjabi se transcribió a Gurmukhi Punjabi para el etiquetado de partes del habla. El corpus fue analizado con la ayuda de Antconc.  ...  After assigning POS tags the corpus was transliterated back to Shahmukhi Punjabi and VP is developed.  ... 
doi:10.46377/dilemas.v27i1.1582 fatcat:b2vhauwvs5d33hgl4ztn6gozwe

An Information-Extraction System for Urdu---A Resource-Poor Language

Smruthi Mukund, Rohini Srihari, Erik Peterson
2010 ACM Transactions on Asian Language Information Processing  
NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging.  ...  The objective of this work is to develop an NLP infrastructure for Urdu that is customizable and capable of providing basic analysis on which more advanced information extraction tools can be built.  ...  The CRULP dataset (dataset POS ) is a corpus of 150,000 words that are only POS tagged and the CRL dataset (dataset NE ) is a corpus of 50,000 words that are only NE tagged.  ... 
doi:10.1145/1838751.1838754 fatcat:ibmmwalmtfbfdpjufxccwolzgq

Tagging Urdu Sentences from English POS Taggers

Adnan Naseem, Muzamma Anwar, Salman Ahmed, Qadeem Akhtar
2017 International Journal of Advanced Computer Science and Applications  
The two best English POS Taggers which tagged Urdu sentences were Stanford POS Tagger and MBSP POS Tagger with an accuracy of 96.4% and 95.7%, respectively.  ...  State-of-the-art English POS Taggers were explored from the literature, however, 11 famous POS Taggers were being input to Urdu sentences for tagging.  ...  Salience Analysis of Urdu News Corpus 85.5 [21] www.ijacsa.thesai.org Saliences in the Urdu language 22 Efficient methods of computational linguistics.  ... 
doi:10.14569/ijacsa.2017.081030 fatcat:cmdv722jfnfddip26555q7ar74

Universal Dependencies for Urdu Noisy Text

2021 International Journal of Advanced Trends in Computer Science and Engineering  
The 500 Urdu tweets treebank iscreated by manually annotating the treebank withlemma, POS tags, morphological and syntacticrelations using the Universal Dependencies annotation scheme, adopted to the peculiarities  ...  of Urdu social media text. annotation process is evaluated through Inter-annotator agreement for dependency relations and total agreement of 94.5% and resultant weighted Kappa = 0.876was observed.  ...  of dependency parses to tweets extracted from [18] 's PoS-tagged Twitter corpus. [2] used bootstrapping for developing a UD based Arabic tweets dependency treebank.  ... 
doi:10.30534/ijatcse/2021/371032021 fatcat:raiscg6okrgchcb6aj4xy3224m

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Ali Saeed, Rao Muhammad Adeel Nawab, Mark Stevenson, Paul Rayson
2019 ACM Transactions on Asian and Low-Resource Language Information Processing  
The corpus contains 5,042 words of Urdu running text in which all ambiguous words (856 instances) are manually tagged with senses from the Urdu Lughat dictionary.  ...  A range of baseline WSD models based on n-grams are applied to the corpus and the best performance (accuracy of 57.71%) is achieved using word 4-grams.  ...  It contains 95.4 million Urdu words and 5.4 million sentences. The UrMono corpus is tokenized and POS tagged using the CLE POS tagset [58] with an accuracy of 87.98%.  ... 
doi:10.1145/3314940 fatcat:awjrtihv6vew7eusjakt4snxjy

Urdu Word Segmentation using Machine Learning Approaches

Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah, Burhan Ullah
2018 International Journal of Advanced Computer Science and Applications  
Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance.  ...  The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task.  ...  For POS tag information we used CLE POS tagged corpus and for NE information we used the UNER dataset [26] .  ... 
doi:10.14569/ijacsa.2018.090628 fatcat:6korgkh27rcrvbyflcnr4hapfq

Towards Silver Standard Dependency Treebank of Urdu Tweets

2021 International Journal of Advanced Trends in Computer Science and Engineering  
of 4500 Urdu tweets.  ...  This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets.  ...  Parsito is now part of the UDPipe [35] parsing pipeline, which handles tokenization, morphological analysis, and POS tagging.  ... 
doi:10.30534/ijatcse/2021/1501032021 fatcat:w5wrrbnoxndk5d3vh3bioahn64

Probabilistic Context Free Grammar for Urdu

Neelam Mukhtar, Department of Computer Science, University of Peshawar - Khyber Pukhtoon khawa, Pakistan, Mohammad Abid Khan, Fatima TuzZuhra, Department of Computer Science, University of Peshawar - Khyber Pukhtoon khawa, Pakistan, Department of Computer Science, University of Peshawar - Khyber Pukhtoon khawa, Pakistan
2016 Linguistics and Literature Review  
This PCFG can be used by the probabilistic parser for Urdu (that is to be developed) that accepts POS tagged text as input and generates the structure of that text.  ...  In Urdu, research is done from different point of views such as creating an Urdu corpus (Samin, Nisar & Sehrai, 2006; Becker & Riaz, 2002) and tagging the Urdu corpus (Anwar, Wang, Luli and Wang, 2007  ... 
doi:10.32350/llr.22.04 fatcat:5kdfwjwxtvcbfj4ifknhf6xssm
« Previous Showing results 1 — 15 out of 498 results