Filters








25,298 Hits in 4.1 sec

Including Dialects and Language Varieties in Author Profiling [article]

Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu
2017 arXiv   pre-print
This paper presents a computational approach to author profiling taking gender and language variety into account.  ...  Our approach achieved 75 accuracy on gender identification on tweets written in four languages and 97 accuracy on language variety identification for Portuguese.  ...  Special thanks to Martin Potthast and Francisco Rangel for replying promptly to all our inquiries and to Paolo Rosso for fruitful discussions and interesting insights about author profiling during the  ... 
arXiv:1707.00621v1 fatcat:nokzlwwuq5hgdcfe6g3m4apicu

Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification [article]

Wajdi Zaghouani, Anis Charfi
2018 arXiv   pre-print
The provided corpus will enrich the limited set of available language resources for Arabic and will be an invaluable enabler for developing author profiling tools and NLP tools for Arabic.  ...  In this paper, we present Arap-Tweet, which is a large-scale and multi-dialectal corpus of Tweets from 11 regions and 16 countries in the Arab world representing the major Arabic dialectal varieties.  ...  Dialectal Arabic The Arabic language used in social media and online is a mix of Modern Standard Arabic (MSA) and other regional dialectal varieties.  ... 
arXiv:1808.07674v1 fatcat:73c5hf53czhkjdligud4sfpece

Author Profiling at PAN: from Age and Gender Identification to Language Variety Identification (invited talk)

Paolo Rosso
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
In 2017 the PAN author profiling shared task addresses jointly gender and language variety identification in Twitter where tweets have been annotated with authors' gender and their specific variation of  ...  His research interests include author profiling and irony detection in social media, opinion spam detection, as well as text reuse and plagiarism detection.  ... 
doi:10.18653/v1/w17-1205 dblp:conf/vardial/Rosso17 fatcat:cd3rdqwrczhf7ov36zu4z5p4fu

A survey on author profiling, deception, and irony detection for the Arabic language

Paolo Rosso, Francisco Rangel, Irazu Hernández Farías, Leticia Cagnina, Wajdi Zaghouani, Anis Charfi
2018 Language and Linguistics Compass  
A survey on author profiling, deception, and irony detection for the Arabic language. Language and Linguistics Compass. 12(4):1-20.  ...  In this paper, we review the state of the art about some of the main author profiling problems, as well as deception and irony detection, especially focusing on the Arabic language.  ...  , as described more in detail in Section 3.2.  Author Profiling at PAN 2017, where together with gender identification, the aim is to detect the language variety of the authors.  ... 
doi:10.1111/lnc3.12275 fatcat:blm24rejubhtjgakvonn7mc4ju

Guidelines and Annotation Framework for Arabic Author Profiling [article]

Wajdi Zaghouani, Anis Charfi
2018 arXiv   pre-print
In this paper, we present the annotation pipeline and the guidelines we wrote as part of an effort to create a large manually annotated Arabic author profiling dataset from various social media sources  ...  Finally, we describe the issues encountered during the annotation phase, especially those related to the peculiarities of Arabic dialectal varieties as used in social media.  ...  Moreover, when the text is written in a dialectal variety such as the Arabic text used in social media, author profiling becomes even more challenging as it requires representative annotated datasets to  ... 
arXiv:1808.07678v1 fatcat:zzp7xihoe5aodm6v6nem54fg2u

BERT-Based Arabic Social Media Author Profiling [article]

Chiyu Zhang, Muhammad Abdul-Mageed
2019 arXiv   pre-print
We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA).  ...  Then we augment shared task data with in-house data for gender and dialect, showing the utility of augmenting training data.  ...  Arabic is a term that refers to a collection of languages, varieties, and dialects.  ... 
arXiv:1909.04181v3 fatcat:gcl62vnrv5fghjccymrojq2ypm

Fine-grained analysis of language varieties and demographics

Francisco Rangel, Paolo Rosso, Wajdi Zaghouani, Anis Charfi
2020 Natural Language Engineering  
In this paper, we focus on a fine-grained analysis of language varieties while considering also the authors' demographics.  ...  We also analyse the relationship of the language variety identification with the authors' gender.  ...  The statements made herein are solely the responsibility of the authors. References  ... 
doi:10.1017/s1351324920000108 fatcat:mdk2yxafbjffnhm5te3b7lscxe

Classifier Ensembles for Dialect and Language Variety Identification [article]

Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi
2018 arXiv   pre-print
In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.  ...  We present a system developed to discriminate between Flemish and Dutch in subtitles and a system trained to discriminate between four Arabic dialects: Egyptian, Levantine, Gulf, North African, and Modern  ...  Acknowledgements We would like to thank the organizers of the ADI shared task and the DFS shared task for making available the datasets used in this paper.  ... 
arXiv:1808.04800v1 fatcat:rgx45vuzarhppeyu3e65io7gda

The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks [article]

Claudia Peersman, Walter Daelemans, Reinhild Vandekerckhove, Bram Vandekerckhove, Leona Van Vaerenbergh
2016 arXiv   pre-print
It also presents a methodology that enables the systematic study of this variation by including all non-standard words in the corpus.  ...  We present a corpus-based analysis of the effects of age, gender and region of origin on the production of both "netspeak" or "chatspeak" features and regional speech features in Flemish Dutch posts that  ...  approach to investigate the correlation between the non-standard language usage and profile metadata in these modes of CMC: in his study on code choice and code-switching in Swiss-German chat rooms, Siebenhaar  ... 
arXiv:1601.02431v1 fatcat:r2furuiz3zhwti47euj7ieoxsu

German Dialect Identification Using Classifier Ensembles [article]

Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu
2018 arXiv   pre-print
The transcripts included in the dataset contained speakers from Basel, Bern, Lucerne, and Zurich. Our entry in the challenge reached 62.03% F1-score and was ranked third out of eight teams.  ...  In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018.  ...  We further thank the anonymous reviewers and Marcos Zampieri for the feedback and suggestions provided.  ... 
arXiv:1807.08230v1 fatcat:zp7ogbswdnh2do6thxbibxtd2m

Exploring morphosyntactic variation in dialects of English across the world

Warren Maguire
2016 English Today  
(many of whom are native speakers of the varieties they describe), WAVE represents a major step forward in our understanding of dialect variation in English and illustrates in fine detail a vast array  ...  Using a list of 235 morphosyntactic features, WAVE explores diatopic variation in L1, L2 and pidgin/creole varieties across the globe, illustrating the results in 96 full-colour maps, and investigates  ...  Firstly, the varieties included in WAVE are divided into five "types": Low-contact traditional L1 dialects (L1t); High-contact L1 varieties (L1c); L2 varieties (L2); Pidgins (P); Creoles (C).  ... 
doi:10.1017/s026607841600033x fatcat:blfdetr54jd5lb6dw27ef7py2m

The non-standard in writing: A look at West African and Southeast Asian literature

Michael PERCILLIER
2018 E-REA  
), particularly in light of differences between authors who are part of the speech community in question, and those who are external to it, and (5) the usage of literary dialects from a diachronic perspective  ...  varieties of language in the literature of the English-speaking world”, and started at the University of Strasbourg in 2013, compares the representation of non-standard language in Anglophone literature  ... 
doi:10.4000/erea.6312 fatcat:hy7dpkce6nb6jo7tgkta6sqahy

Nonmainstream Dialect Use and Specific Language Impairment

Janna B. Oetting, Janet L. McDonald
2001 Journal of Speech, Language and Hearing Research  
Patterns within the SLI profile that cut across the two dialects included difficulties with tense marking and question formation.  ...  Although the grammatical profile of SLI has been explored in a wide range of languages, including Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Spanish, Swedish, and even  ...  Acknowledgments The project was made possible by a grant from the National Institute on Deafness and Other Communication Disorders (R03 DCO3609) that was awarded to the first author and an Interdisciplinary  ... 
doi:10.1044/1092-4388(2001/018) pmid:11218104 pmcid:PMC3381904 fatcat:npnys6vklbebpj4hv2f56ymvie

DELATERALISATION IN ARABIC AND MEHRI

2019 Dialectologia  
However, it was found lately in use in some Arabic dialects such as Rijāl Almaʕ in southwest Saudi Arabia and in some varieties of the Mehri language. Nevertheless, delateralisation is apparent.  ...  Saudi Arabia. 74 speakers, 38 speakers of Rijāl Almaʕ in Abha city and 36 Mehri speakers in Dammam city, participated in this study.  ...  Phonetic analysis was performed using the PRAAT program for acoustic analysis (www.fon.hum.uva.nl/praat/).The authors found that the Mahriyōt dialect, the variety of Mehri spoken in far eastern Yemen,  ... 
doi:10.1344/dialectologia2019.23.1 fatcat:qhk5yo6m3ngn5aqwbvaeswjvrm

Assessing linguistic vulnerability and endangerment in Serbia a critical survey of methodologies and outcomes

Annemarie Sorescu-Marinkovic, Mirjana Miric, Svetlana Cirkovic
2020 Balcanica  
The paper offers a critical survey of vulnerable and endangered languages and linguistic varieties in Serbia presented in three international inventories: UNESCO?s Atlas of the World?  ...  s Languages in Danger, Ethnologue and The Catalogue of Endangered Languages.  ...  dialects and varieties spoken in Serbia.  ... 
doi:10.2298/balc2051065s fatcat:e5knfr4bpbe3xo2tjntkthu3nu
« Previous Showing results 1 — 15 out of 25,298 results