Corpus-based Translation of Ontologies for Improved Multilingual Semantic Annotation

Claudia Bretschneider, Heiner Oberkampf, Sonja Zillner, Bernhard Bauer, Matthias Hammon
2014 Proceedings of the Third Workshop on Semantic Web and Information Extraction  
Ontologies have proven to be useful to enhance NLP-based applications such as information extraction. In the biomedical domain rich ontologies are available and used for semantic annotation of texts. However, most of them have either no or only few non-English concept labels and cannot be used to annotate non-English texts. Since translations need expert review, a full translation of large ontologies is often not feasible. For semantic annotation purpose, we propose to use the corpus to be
more » ... e corpus to be annotated to identify high occurrence terms and their translations to extend respective ontology concepts. Using our approach, the translation of a subset of ontology concepts is sufficient to significantly enhance annotation coverage. For evaluation, we automatically translated RadLex ontology concepts from English into German. We show that by translating a rather small set of concepts (in our case 433), which were identified by corpus analysis, we are able to enhance the amount of annotated words from 27.36 % to 42.65 %. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http://creativecommons.org/licenses/by/4.0/
doi:10.3115/v1/w14-6201 dblp:conf/acl-swaie/BretschneiderOZ14 fatcat:g7gvyimn5zbmxjnmwg4wjunaxm