k-Neighborhood decentralization: A comprehensive solution to index the UMLS for large scale knowledge discovery

Yang Xiang, Kewei Lu, Stephen L. James, Tara B. Borlawsky, Kun Huang, Philip R.O. Payne
2012 Journal of Biomedical Informatics  
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (KDLS) for the UMLS, and the
more » ... nding method to effectively evaluate the KDLS indexing results. KDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use KDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that KDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.
doi:10.1016/j.jbi.2011.11.012 pmid:22154838 pmcid:PMC3306517 fatcat:zd7nlyavxvbpphhwdgankfrw5q