A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Analysis and development of Urdu POS tagged corpus
2009
Proceedings of the 7th Workshop on Asian Language Resources - ALR7
unpublished
In this paper, two corpora of Urdu (with 110K and 120K words) tagged with different POS tagsets are used to train TnT and Tree taggers. ...
The existing tagged corpora are tagged with the new tagset to develop a single corpus of 230K words and the TnT tagger is retrained. ...
Introduction There is increasing amount of work on computational modeling of Urdu language. As various groups work on the language, diversity in analysis is also developed. ...
doi:10.3115/1690299.1690303
fatcat:3slezbijzvcz3gu7cpjncwxtky
Performance Comparison of Bootstrapped Statistical Taggers on Urdu Tweets
2021
International Journal of Scientific and Research Publications (IJSRP)
The aim of this study is twofold. First, is to investigate how well the statistical taggers developed for POS tagging of structured text fare in the domain of tweet POS tagging. ...
To this end, Stanford and MorphoDiTa taggers were trained on 500 Urdu tweet gold-standard corpus and were utilized for semi-automatic corpus annotation in bootstrapped fashion. ...
All faculty members of Department of Computer Science, Isra University are acknowledged for their help and support throughout the course of this study. ...
doi:10.29322/ijsrp.11.07.2021.p11559
fatcat:2awecd6mwzgt3clwkcd6ml65ku
Developing a POS Tagged Corpus of Urdu Tweets
2020
Computers
We introduce a new tagset for POS-tagging of Urdu tweets along with the POS-tagged Urdu tweets corpus. ...
Still, no such attempt has been made to develop a POS tagger for Urdu social media content. Thus, the focus of this paper is on POS tagging of Urdu tweets. ...
Figure 1 . 1 Corpus Development Process.
Figure 1 . 1 Corpus Development Process.
Figure 2 . 2 An Example of a Part-of-Speech tagged Urdu Tweet. ...
doi:10.3390/computers9040090
fatcat:tqbvexu5h5bkzc6gqb4t7zi2wi
Transtech: development of a novel translator for Roman Urdu to English
2019
Heliyon
Self-maintained corpus alongwith its corresponding tag-set is used for tokenization. The syntactical structure is covered by writing Urdu POS tagger based on grammatical rules. ...
The objective of this research is to develop and test a novel tactic to solve the issue of translation from Roman Urdu to the English language. ...
It consists of terminals (POS tags) and non-terminals, which generates a set of production rules. ...
doi:10.1016/j.heliyon.2019.e01780
pmid:31193721
pmcid:PMC6538981
fatcat:aqcvtgoiobcuzhfvrwwdsztcnq
Development of Saraiki WordNet by Mapping of Word Senses: A Corpus-based Approach
2021
Linguistics and Literature Review
This paper aimed to develop the Saraiki WordNet. Saraiki is one of the regional languages spoken in Pakistan and has a unique history of its own. ...
This study may aid in creating bilingual dictionaries (of Saraiki and Urdu?) in the future. Keywords: expand approach, mapping, Saraiki language, WordNet ...
These POS tag sets helped in tagging data and categorized them in proper grammatical categories. POS tag sets can be developed from scratch or they can be downloaded as well. ...
doi:10.32350/llr.72/04
fatcat:7bvuzavwdbb7dpf6p53qyldeby
Building a Hierarchical Annotated Corpus of Urdu: The URDU.KON-TB Treebank
[chapter]
2012
Lecture Notes in Computer Science
Part of speech (POS) tagging and annotation of a selected set of sentences from different sub-domains of this corpus is in process manually and the work performed till to date is presented here. ...
Urdu is a comparatively under resourced language and the development of a reliable treebank for Urdu will have significant impact on the state-of-the-art for Urdu language processing. ...
Miriam Butt, University of Konstanz for her encouragement, guidance and support. ...
doi:10.1007/978-3-642-28604-9_6
fatcat:o4fbizkjqngstj42yshpvlrnoq
A survey on sentiment analysis in Urdu: A resource-poor language
2020
Egyptian Informatics Journal
The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. ...
An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations ...
This Research work was supported by Zayed University Research Incentives Fund#R18052, co-funded by Norwegian university of science and technology, Ålesund, Norway. ...
doi:10.1016/j.eij.2020.04.003
fatcat:qvymechpvnhypg2telxbs4wj4m
Estudio basado en corpus sobre el perfil de vocabulario del lenguaje Shahmukhi Punjabi
2019
Dilemas Contemporáneos: Educación, Política y Valores
Se ha observado que las palabras del idioma Punjabi tienen muchos casos y formas diferentes como contrarias al idioma inglés y similares al idioma Urdu. ...
Un corpus de Shahmukhi Punjabi se transcribió a Gurmukhi Punjabi para el etiquetado de partes del habla. El corpus fue analizado con la ayuda de Antconc. ...
After assigning POS tags the corpus was transliterated back to Shahmukhi Punjabi and VP is developed. ...
doi:10.46377/dilemas.v27i1.1582
fatcat:b2vhauwvs5d33hgl4ztn6gozwe
An Information-Extraction System for Urdu---A Resource-Poor Language
2010
ACM Transactions on Asian Language Information Processing
NLP systems begin with modules such as word segmentation, part-of-speech tagging, and morphological analysis and progress to modules such as shallow parsing and named entity tagging. ...
The objective of this work is to develop an NLP infrastructure for Urdu that is customizable and capable of providing basic analysis on which more advanced information extraction tools can be built. ...
The CRULP dataset (dataset POS ) is a corpus of 150,000 words that are only POS tagged and the CRL dataset (dataset NE ) is a corpus of 50,000 words that are only NE tagged. ...
doi:10.1145/1838751.1838754
fatcat:ibmmwalmtfbfdpjufxccwolzgq
Tagging Urdu Sentences from English POS Taggers
2017
International Journal of Advanced Computer Science and Applications
The two best English POS Taggers which tagged Urdu sentences were Stanford POS Tagger and MBSP POS Tagger with an accuracy of 96.4% and 95.7%, respectively. ...
State-of-the-art English POS Taggers were explored from the literature, however, 11 famous POS Taggers were being input to Urdu sentences for tagging. ...
Salience
Analysis of
Urdu News
Corpus
85.5
[21]
www.ijacsa.thesai.org
Saliences in the
Urdu language
22
Efficient
methods of
computational
linguistics. ...
doi:10.14569/ijacsa.2017.081030
fatcat:cmdv722jfnfddip26555q7ar74
Universal Dependencies for Urdu Noisy Text
2021
International Journal of Advanced Trends in Computer Science and Engineering
The 500 Urdu tweets treebank iscreated by manually annotating the treebank withlemma, POS tags, morphological and syntacticrelations using the Universal Dependencies annotation scheme, adopted to the peculiarities ...
of Urdu social media text. annotation process is evaluated through Inter-annotator agreement for dependency relations and total agreement of 94.5% and resultant weighted Kappa = 0.876was observed. ...
of dependency parses to tweets extracted from [18] 's PoS-tagged Twitter corpus. [2] used bootstrapping for developing a UD based Arabic tweets dependency treebank. ...
doi:10.30534/ijatcse/2021/371032021
fatcat:raiscg6okrgchcb6aj4xy3224m
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation
2019
ACM Transactions on Asian and Low-Resource Language Information Processing
The corpus contains 5,042 words of Urdu running text in which all ambiguous words (856 instances) are manually tagged with senses from the Urdu Lughat dictionary. ...
A range of baseline WSD models based on n-grams are applied to the corpus and the best performance (accuracy of 57.71%) is achieved using word 4-grams. ...
It contains 95.4 million Urdu words and 5.4 million sentences. The UrMono corpus is tokenized and POS tagged using the CLE POS tagset [58] with an accuracy of 87.98%. ...
doi:10.1145/3314940
fatcat:awjrtihv6vew7eusjakt4snxjy
Urdu Word Segmentation using Machine Learning Approaches
2018
International Journal of Advanced Computer Science and Applications
Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. ...
The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. ...
For POS tag information we used CLE POS tagged corpus and for NE information we used the UNER dataset [26] . ...
doi:10.14569/ijacsa.2018.090628
fatcat:6korgkh27rcrvbyflcnr4hapfq
Towards Silver Standard Dependency Treebank of Urdu Tweets
2021
International Journal of Advanced Trends in Computer Science and Engineering
of 4500 Urdu tweets. ...
This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets. ...
Parsito is now part of the UDPipe [35] parsing pipeline, which handles tokenization, morphological analysis, and POS tagging. ...
doi:10.30534/ijatcse/2021/1501032021
fatcat:w5wrrbnoxndk5d3vh3bioahn64
Probabilistic Context Free Grammar for Urdu
2016
Linguistics and Literature Review
This PCFG can be used by the probabilistic parser for Urdu (that is to be developed) that accepts POS tagged text as input and generates the structure of that text. ...
In Urdu, research is done from different point of views such as creating an Urdu corpus (Samin, Nisar & Sehrai, 2006; Becker & Riaz, 2002) and tagging the Urdu corpus (Anwar, Wang, Luli and Wang, 2007 ...
doi:10.32350/llr.22.04
fatcat:5kdfwjwxtvcbfj4ifknhf6xssm
« Previous
Showing results 1 — 15 out of 498 results