A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Experiments in Language Variety Geolocation and Dialect Identification
2020
Workshop on NLP for Similar Languages, Varieties and Dialects
In this paper we describe the systems we used when participating in the VarDial Evaluation Campaign organized as part of the 7th workshop on NLP for similar languages, varieties and dialects. ...
The submissions of our SUKI team used generative language models based on Naive Bayes and character n-grams. ...
The shared tasks have been organized as part of the VarDial workshops dealing with computational methods and language resources for closely related languages, language varieties, and dialects. ...
dblp:conf/vardial/JauhiainenJL20
fatcat:mtv4hjmviramdgjy7khn4kan44
LSTM Autoencoders for Dialect Analysis
2016
Workshop on NLP for Similar Languages, Varieties and Dialects
Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group. ...
We apply our architectures to three different datasets and show that the learned representations indicate highly similar results with the analyses based on Levenshtein distance and capture the traditional ...
Dialects of Germany Figure 6 presents similar analyses for dialects of Germany. ...
dblp:conf/vardial/RamaC16
fatcat:vk7ao2hf4nc2jedf3n2k77jlpi
Exploring Classifier Combinations for Language Variety Identification
2018
Workshop on NLP for Similar Languages, Varieties and Dialects
This paper describes CLiPS's submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018. ...
This confidence vote approach outperforms a meta-classifier on the development data and on the test data. ...
Since the Netherlands and Flanders adhere to the same standard language (Dutch), the task at hand is one of language variety identification rather than similar language identification. ...
dblp:conf/vardial/KreutzD18
fatcat:kwxlqk4ckjdq7fyajdaotdyegm
Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Our method allows for fine-grained distinctions between similar languages and dialects and allows us to rediscover the language composition of our Twitter dataset. ...
We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). ...
Discriminating languages and dialects automatically is a critical pre-processing step for more advanced NLP applications (Dagli et al, 2016) . ...
doi:10.18653/v1/w17-1209
dblp:conf/vardial/WilliamsD17
fatcat:tvxhoebq4rfi5aha6hs6grmbwe
A Tokenization System for the Kurdish Language
2020
Workshop on NLP for Similar Languages, Varieties and Dialects
Tokenization is one of the essential and fundamental tasks in natural language processing. ...
In this paper, as a preliminary study of its kind, we propose an approach for the tokenization of the Sorani and Kurmanji dialects of Kurdish using a lexicon and a morphological analyzer. ...
Acknowledgements The author would like to thank the four anonymous reviewers for their constructive comments. ...
dblp:conf/vardial/Ahmadi20a
fatcat:3ypb7eiy2vamhke4u66frj5ske
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties
2016
Workshop on NLP for Similar Languages, Varieties and Dialects
This article describes the systems submitted by the Citius Ixa Imaxin team to the Discriminating Similar Languages Shared Task 2016. ...
The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. ...
First, the sub-task 1 is focused on discriminating between similar languages and national language varieties, including five different groups of related languages or language varieties: • Bosnian, Croatian ...
dblp:conf/vardial/GamalloACA16
fatcat:yzl5jwfhcvdghnqk3u4ascikqa
A Simple Baseline for Discriminating Similar Languages
2014
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects
This paper describes an approach to discriminating similar languages using word-and characterbased features, submitted as the Queen Mary University of London entry to the Discriminating Similar Languages ...
Using a standard supervised classifier with word and character n-grams as features, we achieved over 90% accuracy in the test; on fixing simple file handling and feature extraction bugs, this improved ...
Background and Related Work Shared Task The Discriminating Similar Languages (DSL) Shared Task was established as part of the 2014 VarDial workshop. 2 The task provided datasets for 13 different languages ...
doi:10.3115/v1/w14-5318
dblp:conf/vardial/Purver14
fatcat:sobafnu4qndkfiruiavsihkc5a
Subdialectal Differences in Sorani Kurdish
2016
Workshop on NLP for Similar Languages, Varieties and Dialects
This is the first preliminary study for a dialect that has not been widely studied in computational linguistics, evidencing the possible existence of distinct subdialects. ...
In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq. ...
Acknowledgements A special thanks to the reviewers for their helpful comments and feedback. ...
dblp:conf/vardial/Malmasi16
fatcat:a67mshqidfhgvg5alesaww4sju
Character Level Convolutional Neural Network for Arabic Dialect Identification
2018
Workshop on NLP for Similar Languages, Varieties and Dialects
We submitted three models with the same architecture except for the first layer. The first system uses one-hot character representation as input to the convolution layer. ...
The ADI shared task included five Arabic dialects: Modern Standard Arabic (MSA), Egyptian, Gulf, Levantine, and North-African. ...
Arabic Dialect Identification task is concerned with identifying the specific Arabic dialect in spoken and written forms which is a crucial task in many Natural Language Processing (NLP) applications. ...
dblp:conf/vardial/Ali18
fatcat:b4pzzwidyjczthwj7f5o226e3y
Discriminating between Similar Languages using Weighted Subword Features
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
I present and discuss the method used in this 14-way language identification task comprising varieties of 6 main language groups. ...
The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task. ...
Acknowledgments Thanks to the anonymous reviewers for their comments. ...
doi:10.18653/v1/w17-1223
dblp:conf/vardial/Barbaresi17
fatcat:l55znhnhjvcszdzlokgcrf523m
Kurdish Interdialect Machine Translation
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
The research is the first attempt for inter-dialect machine translation in Kurdish and particularly could help in making online texts in one dialect comprehensible to those who only speak the target dialect ...
They are rated as slightly understandable in 29% cases for Kurmanji and 21% for Sorani. ...
Dzejla Medjedovic an Assistant Professor and Vice Dean of Graduate Program at the University Sarajevo School of Science and Technology (SSST) for reviewing this paper and providing influential recommendations ...
doi:10.18653/v1/w17-1208
dblp:conf/vardial/Hassani17
fatcat:swg3su7e5jahfk4b34r7nlsooy
German Dialect Identification in Interview Transcriptions
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams. ...
The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich. ...
Acknowledgement We would like to thank the GDI task organizers, Noëmi Aepli and Yves Scherrer, for proposing and organizing this shared task. ...
doi:10.18653/v1/w17-1220
dblp:conf/vardial/MalmasiZ17
fatcat:mz33sdefcbdljozqdsv2k2fw7a
Computational analysis of Gondi dialects
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group. ...
We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. ...
The code and the data for the experiments is available at https://github.com/ PhyloStar/Gondi-Dialect-Analysis ...
doi:10.18653/v1/w17-1203
dblp:conf/vardial/RamaCS17
fatcat:wgdgllbwbvf7dha66aqzt3cpxa
The similarity and Mutual Intelligibility between Amharic and Tigrigna Varieties
2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
The Amharic speakers' familiarity to the Tigrigna varieties seems largely dependent on the genealogical relation between Amharic and the two Tigrigna varieties. ...
The present study has examined the similarity and the mutual intelligibility between Amharic and two Tigrigna varities using three tools; namely Levenshtein distance, intelligibility test and questionnaires ...
Recently, several studies have been conducted on European languages and on Chinese dialects, for example, (Gooskens and Heeringa, 2004; Tang and Heuven, 2007; Tang and Heuven, 2009; Tang and Heuven, 2015 ...
doi:10.18653/v1/w17-1206
dblp:conf/vardial/Feleke17
fatcat:6jf6wbaafjeunfe7rgajqvj22a
Using Maximum Entropy Models to Discriminate between Similar Languages and Varieties
2014
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects
DSLRAE is a hierarchical classifier for similar written languages and varieties based on maximum-entropy (maxent) classifiers. ...
For each group of languages, the classifier uses a different kind and combination of knowledge-poor features: token or character n-grams and 'white lists' of tokens. ...
Similar or closely related languages often reflect a common origin and are members of a dialect continuum (Bloomfield, 1935) . ...
doi:10.3115/v1/w14-5314
dblp:conf/vardial/PortaS14
fatcat:2bmlsrurhzfldh5q3jnwapxm5i
« Previous
Showing results 1 — 15 out of 904 results