Challenges Behind the Data-driven Bulgarian WordNet (BulTreeBank Bulgarian Wordnet)

Petya Osenova, Kiril Ivanov Simov
2017 International Conference on Language, Data, and Knowledge  
The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both -syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora.
more » ... The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval.
dblp:conf/ldk/OsenovaS17 fatcat:algvqt5eencdxp7gdxt7hpgcgy