From Language Documentation Data to LLOD: A Case Study in Turkic Lemon Dictionaries

Christian Chiarcos, Désirée Walther, Maxim Ionov
2017 International Conference on Language, Data, and Knowledge  
In this paper, we describe the Lemon-OntoLex modeling of dictionaries created within language documentation efforts. We focus on exemplary resources for two less-resourced languages from the Turkic language family, Chalkan and Tuvan. Both datasets have been conveted into a Linked Data representation using the Lemon-OntoLex data model, with an extensible converter written in Python. We compare the conversion process for two both lexical resources, we analyze the difficulties we encountered
more » ... the conversion process and discuss the cases which caused the most common problems during the conversion. Furthermore, we evaluate the quality of converted dictionaries using specially designed SPARQL queries, and by manually checking random samples of the data. Finally, we describe the future application of this data within a lexicographic-comparative workbench, designed to facilitate language contact studies.
dblp:conf/ldk/ChiarcosWI17 fatcat:lk5jfcae3ncmpacp6n3h3of3ru