Filters








81 Hits in 3.8 sec

Determining the Multiword Expression Inventory of a Surprise Language

Bahar Salehi, Paul Cook, Timothy Baldwin
2016 International Conference on Computational Linguistics  
Much previous research on multiword expressions (MWEs) has focused on the token-and typelevel tasks of MWE identification and extraction, respectively.  ...  Our proposed model is trained on a treebank with MWE relations of a source language, and can be applied to the monolingual corpus of the surprise language to identify its MWE construction types.  ...  development of this work.  ... 
dblp:conf/coling/SalehiCB16 fatcat:kyko72iiizaujjzgfsai7egua4

A Combination of Frequent Pattern Mining and Graph Traversal Approaches for Aspect Elicitation in Customer Reviews

Sepideh Jamshidi-Nejad, Fatemeh Ahmadi-Abkenari, Pyman Bayat
2020 IEEE Access  
Razavi and Asadpour (2017) proposed an unsupervised approach for aspect identification based on an embedding word method in Persian reviews.  ...  Since verbs in a Persian sentence appear at the end of sentences, so visiting a node with verb class indicates reaching the end of a sentence in ADG graph.  ... 
doi:10.1109/access.2020.3017486 fatcat:ymhr3z6aprbr3fkhthqrtykuqm

Morphological Networks for Persian and Turkish: What Can Be Induced from Morpheme Segmentation?

Hamid Haghdoost, Ebrahim Ansari, Zdeněk Žabokrtský, Mahshid Nikravesh, Mohammad Mahmoudi
2020 Prague Bulletin of Mathematical Linguistics  
For this purpose, we use existing morpheme-segmentation tools, namely supervised and unsupervised version of Morfessor, and (unsupervised) MorphSyn.  ...  We use our large handsegmented set of word forms in the experiments with Persian, which is contrasted with employing only a very limited manually segmented lexicon for Turkish that existed previously.  ...  CZ.02.2.69/0.0/0.0/16_027/0008495, International Mobility of Researchers at Charles University, and by grant No. 19-14534S of the Grant Agency of the Czech Republic. It has been using language re-  ... 
doi:10.14712/00326585.007 fatcat:sduh5n263ndkbfkii7vzn6sqgi

Exploiting multilingual lexical resources to predict MWE compositionality [chapter]

Bahar Salehi, Paul Cook, Timothy Baldwin
2018 Zenodo  
We evaluate these methods over English noun compounds, English verb-particle constructions, and German noun compounds.  ...  MWEs in a wide range of languages.  ...  Given that we do not perform any lemmatisation or other languagespecific preprocessing, we inevitably achieve low recall for the identification of noun compound tokens, although the precision should be  ... 
doi:10.5281/zenodo.1469572 fatcat:vfkhpas52fcd7ha5kzlh4zcmyi

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

L. Jafar Tafreshi, F. Soltanzadeh
2020 Journal of Artificial Intelligence and Data Mining  
The result of this study showed that our approach achieved 86.86% precision, 80.29% recall and 83.44% F-measure which are relatively higher than those values reported for other Persian NER methods.  ...  Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text.  ...  Acknowledgments This project was funded by Computer Research Center of Islamic Sciences (CRCIS). We appreciate the colleagues who helped us in this project: Morteza Rezaei-Sharifabadi, Dr.  ... 
doi:10.22044/jadm.2019.8430.1980 doaj:cc4826ccf7f54ddd93343ebf7658788c fatcat:vk6eb2cvn5g3jeh65wqlg5l6uu

A Survey on sentiment analysis in Persian: A Comprehensive System Perspective Covering Challenges and Advances in Resources, and Methods [article]

Zeinab Rajabi, MohammadReza Valavi
2021 arXiv   pre-print
The main target of this paper is to provide a comprehensive literature survey for state-of-the-art advances in Persian sentiment analysis.  ...  Then, a detailed survey of the sentiment analysis methods used for Persian texts is presented, and previous relevant works on Persian Language are discussed.  ...  Human and Animal Rights This article does not contain any studies with human or animal subjects performed by any of the authors.  ... 
arXiv:2104.14751v1 fatcat:ftt5inmi6ngvnngneyc2rp25by

Issues In Malayalam Text Summarization

D. K. Kanitha, D. Muhammad Noorul Mubarak, S. A. Shanavas
2018 Zenodo  
Text Summarization is the process of creates an abridged version of the original text and it covers overall idea about the document. The human summarization requires lot of time and effort.  ...  At the same time summarization system produce summary within a short span of time. It generates summaries or abstracts of large documents.  ...  So high chance in formulate compound word and a proper compound word splitter is necessary.  Word and Sentence Boundary Identification: It is difficult Malayalam documents because proper identification  ... 
doi:10.5281/zenodo.1205084 fatcat:orxml35bknephbm63ik5kjhlri

Automatic Lexical Simplification for Turkish [article]

Ahmet Yavuz Uluslu
2022 arXiv   pre-print
Turkish is a morphologically rich agglutinative language that requires unique considerations such as the proper handling of inflectional cases.  ...  Being a low-resource language in terms of available resources and industrial-strength tools, it makes the text simplification task harder to approach.  ...  Since the word has several meanings and repeatedly used in compound verbs (hak etmek, hakkı olmak, hak görmek) and nouns (hak sahibi, miras hakkı), it frequently appears inside the corpus.  ... 
arXiv:2201.05878v2 fatcat:oafusm7bbna4pey4cwzzyeldoe

Multi-word Entity Classification in a Highly Multilingual Environment

Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
2017 Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)  
iii resources to predict the compositionality of MWEs. The program also included a panel discussion on the future directions of the MWE community and the SIGLEX Section.  ...  We would like to thank the members of the program committee for the timely reviews, authors for their valuable contributions, shared task organizers, annotators, and system developers for their hard work  ...  This work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic, project No.  ... 
doi:10.18653/v1/w17-1702 dblp:conf/mwe/ChesneyJSP17 fatcat:bv7aavgth5eurmzuphuowtuuhq

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

Noushin Riahi, Fatemeh Sedghi
2016 Journal of Computer and Communications  
One of the ways of disambiguation is the use of decision list algorithm which is a supervised method.  ...  The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set.  ...  Identification of such collocations is possible by a big untagged corpus.  ... 
doi:10.4236/jcc.2016.44010 fatcat:h5snnpi6inedzatyt7wt65nvda

An Information-Extraction System for Urdu---A Resource-Poor Language

Smruthi Mukund, Rohini Srihari, Erik Peterson
2010 ACM Transactions on Asian Language Information Processing  
There has been an increase in the amount of multilingual text on the Internet due to the proliferation of news sources and blogs.  ...  All of this requires a robust NLP system.  ...  The vocabulary of Urdu is highly influenced by Arabic and Persian, while Hindi is influenced by Sanskrit. (3) Missing diacritics problem.  ... 
doi:10.1145/1838751.1838754 fatcat:ibmmwalmtfbfdpjufxccwolzgq

Program Committee

2006 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation  
Including the papers on embeddings, there were 15 rejections: the acceptance rate for full papers was 58% a sign of the consistently high quality of papers submitted to the conference.  ...  Corpus-based Learning of Lexical Semantic Relations by Vered Shwartz (Bar-Ilan University), Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources by Alexander Panchenko (University  ...  the University of Zagreb.  ... 
doi:10.1109/scam.2006.23 dblp:conf/scam/X06c fatcat:2dhsf7loj5hlffu2jxpmlo2qcq

Paraphrasing, textual entailment, and semantic similarity above word level [article]

Venelin Kovatchev
2022 arXiv   pre-print
In Part II: "Paraphrase Typology and Paraphrase Identification", I focus on the meaning relation of paraphrasing and the empirical task of automated Paraphrase Identification (PI).  ...  In Part I: "Similarity at the Level of Words and Phrases", I study the Distributional Hypothesis (DH) and explore several different methodologies for quantifying semantic similarity at the levels of words  ...  Acknowledgments First of all, I'd like to thank my partner, Mila. She has been with me every step of the way, through deadlines, submissions, acceptances, and rejections.  ... 
arXiv:2208.05387v1 fatcat:bfan3d4tzfcgngfyn67jhkqbqu

Topic Modeling for Native Language Identification

Sze-Meng Jojo Wong, Mark Dras, Mark Johnson
2011 Australasian Language Technology Association Workshop  
Native language identification (NLI) is the task of determining the native language of an author writing in a second language.  ...  Several pieces of earlier work have found that features such as function words, part-of-speech n-grams and syntactic structure are helpful in NLI, perhaps representing characteristic errors of different  ...  Acknowledgments We acknowledge the support of ARC grant LP0776267. We also thank the anonymous reviewers, particularly for insightful critiques of the analysis of the topic models.  ... 
dblp:conf/acl-alta/WongDJ11 fatcat:yziqb3wjtfa7de3qau7qsiegru

Eskander_columbia_0054D_16928.pdf [article]

2021
In this work, we propose new fully unsupervised approaches for two tasks in morphology: unsupervised morphological segmentation and unsupervised cross-lingual part-of-speech (POS) tagging, which have been  ...  Unsupervised Morphological Segmentation and Part-of-Speech Tagging for Low-Resource Scenarios Ramy Eskander With the high cost of manually labeling data and the increasing interest in low-resource languages  ...  Arabic is also the best source language for tagging verbs in Persian.  ... 
doi:10.7916/d8-kz6e-7627 fatcat:p4tzacmywfgxffkumbf7ve2v7q
« Previous Showing results 1 — 15 out of 81 results