Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge

Kishlay Jha, Wei Jin
2016 Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '16  
The problem of inferring novel knowledge from implicit facts by logically connecting independent fragments of literature is known as Literature Based Discovery(LBD). In LBD, to discover hidden links, it is important to determine the relevancy between concepts using appropriate information measures. In this study, to discover interesting and inherent links latent in large corpora, nine distinct methods, comprising variants of statistical information measures and derived semantic knowledge from
more » ... main ontology, are designed and compared. A series of experiments are performed and analyzed for those proposed methods. Also, a new strategy of effective preprocessing is proposed, which is capable of removing terms that have meager chances of constituting a new discovery. Finally, an organized list of final concepts deemed worthy of scientific investigation are provided to the user. Overall, our research presents a comprehensive analysis and perspective of how different statistical information measures and semantic knowledge affect the knowledge discovery procedure. iii
doi:10.1145/2975167.2975200 dblp:conf/bcb/JhaJ16 fatcat:ldtlfdods5hn7n4hxonqbc7fsq