Exploring Methods and Resources for Discriminating Similar Languages

Marco Lui, Ned Letcher, Oliver Adams, Long Duong, Paul Cook, Timothy Baldwin
2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects  
The Discriminating between Similar Languages (DSL) shared task at VarDial challenged participants to build an automatic language identification system to discriminate between 13 languages in 6 groups of highly-similar languages (or national varieties of the same language). In this paper, we describe the submissions made by team UniMelb-NLP, which took part in both the closed and open categories. We present the text representations and modeling techniques used, including cross-lingual POS
more » ... as well as fine-grained tags extracted from a deep grammar of English, and discuss additional data we collected for the open submissions, utilizing custombuilt web corpora based on top-level domains as well as existing corpora.
doi:10.3115/v1/w14-5315 dblp:conf/vardial/LuiLADCB14 fatcat:ci2px3whwvel5efx6sqs7tpt3i