Knowledge Discovery in Variant Databases Using Inductive Logic Programming

Hoan Nguyen, Tien-Dao Luu, Olivier Poch, Julie D. Thompson
2013 Bioinformatics and Biology Insights  
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants
more » ... to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
doi:10.4137/bbi.s11184 pmid:23589683 pmcid:PMC3615990 fatcat:tp7dx3s5ybdnrnptlkhwl6lula