281 Hits in 6.2 sec

A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging

Majdi Sawalha, Eric Atwell
2013 Word Structure  
Many linguistic analyses use part-of-speech tagged corpora to analyse text and extract information, where part-of-speech tags play an essential role in classifying text and direct search to the actions  ...  The SNoW-based Part of Speech Tagger 3 and LBJ Part of Speech Tagger 4 make use of the Sequential Model.  ...  tag set for Arabic Part-of-Speech taggers and tagged corpora.  ... 
doi:10.3366/word.2013.0035 fatcat:7qks5fcqdfhqtikplginoqzgq4

SALMA: Standard Arabic Language Morphological Analysis

M. Sawalha, E. Atwell, M. A. M. Abushariah
2013 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA)  
The SALMA-Tools is a collection of open-source standards, tools and resources that widen the scope of Arabic word structure analysisparticularly morphological analysis, to process Arabic text corpora of  ...  The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word,  ...  (ii) Parts of The SALMA Tag Set were also used in the Arabic morphological analyzer and part-of-speech tagger Qutuf [38] .  ... 
doi:10.1109/iccspa.2013.6487311 fatcat:zyszkduja5gjlkgsnxknpwq7re

POS Tagging without a Tagger: Using Aligned Corpora for Transferring Knowledge to Under-Resourced Languages

Ines Turki Khemakhem, Salma Jamoussi, Abdelmajid Ben Hamadou
2016 Journal of Computacion y Sistemas  
The experimentation of the proposed approach is performed for a pair of languages: English as a rich-resourced language and Arabic as an under-resourced language.  ...  The task is importatn because it assigns to word tags that highlight their morphological features by considering the corresponding contexts.  ...  In order to evaluate our approach, we need a partof-speech tagger to compare the results. We use MADA [8] , a supervised part-of-speech tagger, to determine POS tags for Arabic.  ... 
doi:10.13053/cys-20-4-2430 fatcat:vtj233ntnnck5mw5oaapqoxpii


Jamilu Awwalu, Saleh El-Yakub Abdullahi, Abraham Eseoghene Evwiekpaefe
2020 FUDMA Journal of Sciences  
This paper presents a review parts of speech tagging, comparison of different tagging techniques, their characteristics, difficulties, limitation, and Multilingual Parts of Speech (POS) tagging approaches  ...  Internet surfing and social networking has made interactions between people and computers very easy, where people can communicate using their languages thus making processing of these languages a useful  ...  POS tagging works by assigning parts of speech label to words given in a text (Pandian and Geetha, 2008) .  ... 
doi:10.33003/fjs-2020-0402-325 fatcat:s75wxf4xhzek5av7gzolv7fany

Automatic Arabic Part-of-Speech Tagging: Deep Learning Neural LSTM Versus Word2Vec

Khwlah Alrajhi et. al.
2019 International Journal of Computing and Digital Systems  
Part-of-speech (POS) tagging is the process of selecting an appropriate POS tag for each word in a natural language sentence.  ...  It is interesting to note that LSTM tagger achieved 99.72% accuracy for tagging morphemes and 99.18% for tagging words, while the Word2Vec tagger achieved 99.55% for tagging morphemes and 97.33% for tagging  ...  One of the fundamental tasks in NLP is part-of-speech (POS) tagging, which is the process of identifying the type (tag) of a given word, such as a noun, verb, pronoun, or adverb, in an input sentence  ... 
doi:10.12785/ijcds/080310 fatcat:gsedjomen5hifcsbsc5khik4ra

Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments

Imad Zeroual, Abdelhak Lakhouaja
2019 Zenodo  
The prime motivation for carrying out the research in this thesis comes from the limited research on Arabic corpus linguistics and the lack of available resources, standards, and efficient tools that can  ...  The term corpus comes from Latin and means "body". According to corpus linguists, a corpus can be defined as a collection of machine-readable authentic texts, including transcripts of spoken data.  ...  As mentioned, Sinclair (2005) formulates the overall instructions proposed by the previous authors in ten fundamental criteria to follow in the design and the compilation of a general corpus:  ... 
doi:10.5281/zenodo.4441159 fatcat:nwix7lrzrbaxpgasing7mgdtwq

Feature-rich PoS Tagging through Taggers Combination : Experience in Arabic

Imad Zeroual, Abdelhak Lakhouaja
2017 Transactions on Machine Learning and Artificial Intelligence  
Part of Speech (PoS) tagging is the task which manage this issue.  ...  The experiments are applied to one of the morphologically complex languages, Arabic. In this paper, we highlight the use of these taggers via various experiments.  ...  Introduction The Part of Speech tagging is the basis of wellknown natural language processing (NLP) fields.  ... 
doi:10.14738/tmlai.54.2981 fatcat:nvyglgew3bhhroc6rwtljfcdei

Prosody prediction for arabic via the open-source boundary-annotated qur'an corpus

M. S. Sawalha, C. Brierley, E. Atwell
2021 Journal of Speech Sciences  
To develop phrase break classifiers, we need a boundary-annotated and part-ofspeech tagged corpus.  ...  We then use this dataset to train, test, and compare two probabilistic taggers (trigram and HMM) for Arabic phrase break prediction, where the task is to predict boundary locations in an unseen test set  ...  To build the Boundary-Annotated Quran Corpus, we extracted, processed and extended data from two online sources: the Tanzil Quran project (27) and the Quranic Arabic Corpus (12, 26, 36) .  ... 
doi:10.20396/joss.v2i2.15038 fatcat:nivlypokcvgo5km3hws4yl35uu

BAAC: Bangor Arabic Annotated Corpus

Ibrahim S Alkhazi, William J.
2018 International Journal of Advanced Computer Science and Applications  
The corpus was used to evaluate the widely used Madamira Arabic part-of-speech tagger and to further investigate compression models for text compressed using partof-speech tags.  ...  This paper describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech.  ...  One example of those tasks is parts-of-speech tagging (POS) of the Arabic language as reported in [10] , [12] , [13] , where the performance of the taggers is best when tagging MSA text.  ... 
doi:10.14569/ijacsa.2018.091120 fatcat:bbrxyukzbvahjbrkhjvmbpb7hm


2014 Journal of Computer Science  
Part Of Speech (POS) tagging forms the important preprocessing step in many of the natural language processing applications such as text summarization, question answering and information retrieval system  ...  It is the process of classifying every word in a given context to its appropriate part of speech. Different POS tagging techniques in the literature have been developed and experimented.  ...  Arabic part-of-speech-tagging using transformation-based learning. The University of Manchester. Albared, M., N. Omar, and M.J.A. Aziz, 2009. Arabic part of speech disambiguation: A survey. Int. Rev.  ... 
doi:10.3844/jcssp.2014.1865.1873 fatcat:p4v35e7robgfpba7zv3hm77vhq

Part-of-speech tagging of Modern Hebrew text

2007 Natural Language Engineering  
Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a Part-of-Speech (POS) category.  ...  In the morphology and orthography of Arabic and Hebrew, words are often formed by concatenating smaller parts, which function as free morpho-syntactic units, each of which with its own POS tag.  ...  The third author is grateful for the support of the Netherlands Institute of Advanced Studies (NIAS), where part of this work was carried out.  ... 
doi:10.1017/s135132490700455x fatcat:xkioggki3vgvzmetdd2et3xkmy

Statistical Parsing by Machine Learning from a Classical Arabic Treebank [article]

Kais Dukes
2015 arXiv   pre-print
Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran.  ...  Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order.  ...  The question that divides us is whether it is crazy enough to have a chance of being correct. -Niels Bohr  ... 
arXiv:1510.07193v1 fatcat:mkx5gtgehrgfjjy5xmwowxgasq

Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities

A. Zeldes, C. T. Schroeder
2015 Digital Scholarship in the Humanities  
This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic.  ...  tagger applying a fine grained and coarse grained set of tags within and outside the domain of literary texts.  ...  This article aims to contribute to the new wave of Digital Coptic studies by presenting and evaluating a comprehensive part of speech (POS) tagging schema for Sahidic Coptic, the classical dialect of the  ... 
doi:10.1093/llc/fqv043 dblp:journals/lalc/ZeldesS15 fatcat:7e37d2y4qbbxri6e4j5w5mcmny

A New Question Answering System for the Arabic Language

Ghassan Kanaan, Awni Hammouri, Riyad Al-Shalabi, Majdi Swalha
2009 American Journal of Applied Sciences  
In order to perform this process, we used an existing tagger to identify proper names and other crucial lexical items and build lexical entries.  ...  Also provide an analysis of Arabic question forms and attempt to formulate better kinds of answers that users find more appropriate.  ...  ACKNOWLEDGMENT The authors would like to thank Professor Martha Evens from Illinois Institute of Technology for her helpful comments and suggestions that reflected many improvements in the presentation  ... 
doi:10.3844/ajas.2009.797.805 fatcat:oe47iadxhveovdnoufzv6565ly

A New Question Answering System for the Arabic Language

2009 American Journal of Applied Sciences  
In order to perform this process, we used an existing tagger to identify proper names and other crucial lexical items and build lexical entries.  ...  Also provide an analysis of Arabic question forms and attempt to formulate better kinds of answers that users find more appropriate.  ...  ACKNOWLEDGMENT The authors would like to thank Professor Martha Evens from Illinois Institute of Technology for her helpful comments and suggestions that reflected many improvements in the presentation  ... 
doi:10.3844/ajassp.2009.797.805 fatcat:ta2n5eace5g4rjdsf3ohw5jakq
« Previous Showing results 1 — 15 out of 281 results