ULisboa: Recognition and Normalization of Medical Concepts

André Leal, Bruno Martins, Francisco Couto
2015 Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)  
This paper describes a system developed for the disorder identification subtask within task 14 of SemEval 2015. The developed system is based on a chain of two modules, one for recognition and another for normalization. The recognition module is based on an adapted version of the Stanford NER system to train CRF models in order to recognize disorder mentions. CRF models were build based on a novel encoding of entity spans as token classifications to also consider non-continuous entities, along
more » ... ith a rich set of features based on (i) domain lexicons and (ii) Brown clusters inferred from a large collection of clinical texts. For disorder normalization, we (i) generated a non ambiguous dictionary of abbreviations from the labelled files, using it together with (ii) an heuristic method based on similarity search and (iii) a comparison method based on the information content of each disorder. The system achieved an F-measure of 0.740 (the second best), with a precision of 0.779, a recall of 0.705.
doi:10.18653/v1/s15-2070 dblp:conf/semeval/LealMC15 fatcat:mdlm5ix6ajbznmnrppwhal25mq