Filters








904 Hits in 4.5 sec

Experiments in Language Variety Geolocation and Dialect Identification

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
2020 Workshop on NLP for Similar Languages, Varieties and Dialects  
In this paper we describe the systems we used when participating in the VarDial Evaluation Campaign organized as part of the 7th workshop on NLP for similar languages, varieties and dialects.  ...  The submissions of our SUKI team used generative language models based on Naive Bayes and character n-grams.  ...  The shared tasks have been organized as part of the VarDial workshops dealing with computational methods and language resources for closely related languages, language varieties, and dialects.  ... 
dblp:conf/vardial/JauhiainenJL20 fatcat:mtv4hjmviramdgjy7khn4kan44

LSTM Autoencoders for Dialect Analysis

Taraka Rama, Çagri Çöltekin
2016 Workshop on NLP for Similar Languages, Varieties and Dialects  
Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group.  ...  We apply our architectures to three different datasets and show that the learned representations indicate highly similar results with the analyses based on Levenshtein distance and capture the traditional  ...  Dialects of Germany Figure 6 presents similar analyses for dialects of Germany.  ... 
dblp:conf/vardial/RamaC16 fatcat:vk7ao2hf4nc2jedf3n2k77jlpi

Exploring Classifier Combinations for Language Variety Identification

Tim Kreutz, Walter Daelemans
2018 Workshop on NLP for Similar Languages, Varieties and Dialects  
This paper describes CLiPS's submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018.  ...  This confidence vote approach outperforms a meta-classifier on the development data and on the test data.  ...  Since the Netherlands and Flanders adhere to the same standard language (Dutch), the task at hand is one of language variety identification rather than similar language identification.  ... 
dblp:conf/vardial/KreutzD18 fatcat:kwxlqk4ckjdq7fyajdaotdyegm

Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth

Jennifer Williams, Charlie Dagli
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
Our method allows for fine-grained distinctions between similar languages and dialects and allows us to rediscover the language composition of our Twitter dataset.  ...  We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID).  ...  Discriminating languages and dialects automatically is a critical pre-processing step for more advanced NLP applications (Dagli et al, 2016) .  ... 
doi:10.18653/v1/w17-1209 dblp:conf/vardial/WilliamsD17 fatcat:tvxhoebq4rfi5aha6hs6grmbwe

A Tokenization System for the Kurdish Language

Sina Ahmadi
2020 Workshop on NLP for Similar Languages, Varieties and Dialects  
Tokenization is one of the essential and fundamental tasks in natural language processing.  ...  In this paper, as a preliminary study of its kind, we propose an approach for the tokenization of the Sorani and Kurmanji dialects of Kurdish using a lexicon and a morphological analyzer.  ...  Acknowledgements The author would like to thank the four anonymous reviewers for their constructive comments.  ... 
dblp:conf/vardial/Ahmadi20a fatcat:3ypb7eiy2vamhke4u66frj5ske

Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties

Pablo Gamallo, Iñaki Alegria, José Ramom Pichel Campos, Manex Agirrezabal
2016 Workshop on NLP for Similar Languages, Varieties and Dialects  
This article describes the systems submitted by the Citius Ixa Imaxin team to the Discriminating Similar Languages Shared Task 2016.  ...  The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers.  ...  First, the sub-task 1 is focused on discriminating between similar languages and national language varieties, including five different groups of related languages or language varieties: • Bosnian, Croatian  ... 
dblp:conf/vardial/GamalloACA16 fatcat:yzl5jwfhcvdghnqk3u4ascikqa

A Simple Baseline for Discriminating Similar Languages

Matthew Purver
2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects  
This paper describes an approach to discriminating similar languages using word-and characterbased features, submitted as the Queen Mary University of London entry to the Discriminating Similar Languages  ...  Using a standard supervised classifier with word and character n-grams as features, we achieved over 90% accuracy in the test; on fixing simple file handling and feature extraction bugs, this improved  ...  Background and Related Work Shared Task The Discriminating Similar Languages (DSL) Shared Task was established as part of the 2014 VarDial workshop. 2 The task provided datasets for 13 different languages  ... 
doi:10.3115/v1/w14-5318 dblp:conf/vardial/Purver14 fatcat:sobafnu4qndkfiruiavsihkc5a

Subdialectal Differences in Sorani Kurdish

Shervin Malmasi
2016 Workshop on NLP for Similar Languages, Varieties and Dialects  
This is the first preliminary study for a dialect that has not been widely studied in computational linguistics, evidencing the possible existence of distinct subdialects.  ...  In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq.  ...  Acknowledgements A special thanks to the reviewers for their helpful comments and feedback.  ... 
dblp:conf/vardial/Malmasi16 fatcat:a67mshqidfhgvg5alesaww4sju

Character Level Convolutional Neural Network for Arabic Dialect Identification

Mohamed Ali
2018 Workshop on NLP for Similar Languages, Varieties and Dialects  
We submitted three models with the same architecture except for the first layer. The first system uses one-hot character representation as input to the convolution layer.  ...  The ADI shared task included five Arabic dialects: Modern Standard Arabic (MSA), Egyptian, Gulf, Levantine, and North-African.  ...  Arabic Dialect Identification task is concerned with identifying the specific Arabic dialect in spoken and written forms which is a crucial task in many Natural Language Processing (NLP) applications.  ... 
dblp:conf/vardial/Ali18 fatcat:b4pzzwidyjczthwj7f5o226e3y

Discriminating between Similar Languages using Weighted Subword Features

Adrien Barbaresi
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
I present and discuss the method used in this 14-way language identification task comprising varieties of 6 main language groups.  ...  The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.  ...  Acknowledgments Thanks to the anonymous reviewers for their comments.  ... 
doi:10.18653/v1/w17-1223 dblp:conf/vardial/Barbaresi17 fatcat:l55znhnhjvcszdzlokgcrf523m

Kurdish Interdialect Machine Translation

Hossein Hassani
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
The research is the first attempt for inter-dialect machine translation in Kurdish and particularly could help in making online texts in one dialect comprehensible to those who only speak the target dialect  ...  They are rated as slightly understandable in 29% cases for Kurmanji and 21% for Sorani.  ...  Dzejla Medjedovic an Assistant Professor and Vice Dean of Graduate Program at the University Sarajevo School of Science and Technology (SSST) for reviewing this paper and providing influential recommendations  ... 
doi:10.18653/v1/w17-1208 dblp:conf/vardial/Hassani17 fatcat:swg3su7e5jahfk4b34r7nlsooy

German Dialect Identification in Interview Transcriptions

Shervin Malmasi, Marcos Zampieri
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams.  ...  The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich.  ...  Acknowledgement We would like to thank the GDI task organizers, Noëmi Aepli and Yves Scherrer, for proposing and organizing this shared task.  ... 
doi:10.18653/v1/w17-1220 dblp:conf/vardial/MalmasiZ17 fatcat:mz33sdefcbdljozqdsv2k2fw7a

Computational analysis of Gondi dialects

Taraka Rama, Çağrı Çöltekin, Pavel Sofroniev
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.  ...  We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology.  ...  The code and the data for the experiments is available at https://github.com/ PhyloStar/Gondi-Dialect-Analysis  ... 
doi:10.18653/v1/w17-1203 dblp:conf/vardial/RamaCS17 fatcat:wgdgllbwbvf7dha66aqzt3cpxa

The similarity and Mutual Intelligibility between Amharic and Tigrigna Varieties

Tekabe Legesse Feleke
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
The Amharic speakers' familiarity to the Tigrigna varieties seems largely dependent on the genealogical relation between Amharic and the two Tigrigna varieties.  ...  The present study has examined the similarity and the mutual intelligibility between Amharic and two Tigrigna varities using three tools; namely Levenshtein distance, intelligibility test and questionnaires  ...  Recently, several studies have been conducted on European languages and on Chinese dialects, for example, (Gooskens and Heeringa, 2004; Tang and Heuven, 2007; Tang and Heuven, 2009; Tang and Heuven, 2015  ... 
doi:10.18653/v1/w17-1206 dblp:conf/vardial/Feleke17 fatcat:6jf6wbaafjeunfe7rgajqvj22a

Using Maximum Entropy Models to Discriminate between Similar Languages and Varieties

Jordi Porta, José-Luis Sancho
2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects  
DSLRAE is a hierarchical classifier for similar written languages and varieties based on maximum-entropy (maxent) classifiers.  ...  For each group of languages, the classifier uses a different kind and combination of knowledge-poor features: token or character n-grams and 'white lists' of tokens.  ...  Similar or closely related languages often reflect a common origin and are members of a dialect continuum (Bloomfield, 1935) .  ... 
doi:10.3115/v1/w14-5314 dblp:conf/vardial/PortaS14 fatcat:2bmlsrurhzfldh5q3jnwapxm5i
« Previous Showing results 1 — 15 out of 904 results