Enhancing semantic relation quality of UMLS knowledge sources

Demeke Ayele, Jean-Pierre Chevallet, Getnet Kassie, Million Meshesha
2012 Proceedings of the International Conference on Management of Emergent Digital EcoSystems - MEDES '12  
The quality of semantic tuples (semantic triples forming subjectpredicate-object) has significant impact in most text mining and knowledge discovery applications. The practical success and usability of these applications momentously depends on the quality of the extracted semantic triples. Most biomedical semantic resources have been developed for different contexts focusing on the structural representation but with less attention on the acceptability and naturalness of the individual semantic
more » ... riples. In this article, we presented an integrated approach for enhancing the quality of semantic tuples in the UMLS knowledge sources. The approach is based on the integration of three existing auditing techniques: avoiding redundant classifications of semantic concepts, reducing hierarchical and associative relationship inconsistencies. We evaluated the approach based on the number of identified wrongly assigned concepts and inconsistent relationships obtained. The quality of each semantic triple is evaluated based on the acceptability and naturalness of the semantic tuples. The evaluation shows promising results. In the evaluation, we have extracted 10,082 semantic triples randomly from UMLS and obtained 5646 taxonomically and 4436 non-taxonomically related semantic triples. 826 concepts are found redundantly classified and 352 are found hierarchically inconsistent. In non-taxonomic semantic triples, out of 4436, 726 are found to be inconsistent. The quality (acceptability and naturalness) of each semantic triples of the first 100 are also evaluated using domain experts. The Cohen's kappa coefficient is used to measure the degree of agreement between the annotators and the result is promising (0.8).
doi:10.1145/2457276.2457289 dblp:conf/medes/AyeleCKM12 fatcat:j77kvewzlnfjphdigrw7ywuzw4