Multilingual Semantic Resources and Parallel Corpora in the Biomedical Domain: the CLEF-ER Challenge

Dietrich Rebholz-Schuhmann, Simon Clematide, Fabio Rinaldi, Senay Kafkas, Erik M. van Mulligen, Quoc-Chinh Bui, Johannes Hellrich, Ian Lewin, David Milward, Michael Poprat, Antonio Jimeno-Yepes, Udo Hahn (+1 others)
2013 Conference and Labs of the Evaluation Forum  
Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the biomedical domain: titles from scientific articles, drug labels from the European Medicines Agency, and patent texts from the European Patent
more » ... ce. The parallel units have been identified, marked-up and formatted for future use. The three different corpora show comparable sizes. In preparation of the CLEF-ER challenge, the documents have been annotated with terminologies in English and non-English languages (de, fr, es, and nl) and the pre-existing terminological resource has been optimized for the entity recognition task in CLEF-ER. Finally a silver standard corpus for entity annotations and their identifiers has been produced on the English documents for the evaluation of challenge contributions. Motivation Biomedical IT solutions require terminological resources (TRs) to achieve interoperability of modules and data. Increasingly such IT solutions require multilingual TRs, since they are used in different countries to capture and encode patient related information in the home language. To this end, the biomedical terminologies have to be produced in different languages and entities have to
dblp:conf/clef/Rebholz-SchuhmannCRKMBHLMPJHK13a fatcat:xpzdujrgpbbzzl7xmrjget6i64