Recent advances in predicting gene–disease associations

Kenneth Opap, Nicola Mulder
2017 F1000Research  
Deciphering gene-disease association is a crucial step in designing therapeutic strategies against diseases. There are experimental methods for identifying gene-disease associations, such as genome-wide association studies and linkage analysis, but these can be expensive and time consuming. As a result, various methods for predicting associations from these and other data in silico have been developed using different approaches. In this article, we review some of the recent approaches to the
more » ... putational prediction of gene-disease association. We look at recent advancements in algorithms, categorising them into those based on genome variation, networks, text mining, and crowdsourcing. We also look at some of the challenges faced in the computational prediction of gene-disease associations. PubMed Abstract | Publisher Full Text 2. Colah RB, Mukherjee MB, Martin S, et al.: Sickle cell disease in tribal populations in India. Indian J Med Res. 2015; 141(5): 509-15. PubMed Abstract | Free Full Text 3. Dawn Teare M, Barrett JH: Genetic linkage studies. Lancet. 2005; 366(9490): 1036-44. PubMed Abstract | Publisher Full Text 4. Frayling TM: Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat Rev Genet. 2007; 8(9): 657-62. PubMed Abstract | Publisher Full Text 5. Boutros M, Ahringer J: The art and design of genetic screens: RNA interference. Nat Rev Genet. 2008; 9(7): 554-66. PubMed Abstract | Publisher Full Text 6. Piro RM, Di Cunto F: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 2012; 279(5): 678-96. PubMed Abstract | Publisher Full Text 7. Tranchevent LC, Capdevila FB, Nitsch D, et al.: A guide to web tools to prioritize candidate genes. Brief Bioinform. 2011; 12(1): 22-32. PubMed Abstract | Publisher Full Text 8. Oti M, Ballouz S, Wouters MA: Web tools for the prioritization of candidate disease genes. Methods Mol Biol. 2011; 760: 189-206. PubMed Abstract | Publisher Full Text Page 7 of 9 F1000Research 2017, 6(F1000 Faculty Rev):578 Last updated: 26 APR 2017 9. Morrison JL, Breitling R, Higham DJ, et al.: GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics. 2005; 6: 233. PubMed Abstract | Publisher Full Text | Free Full Text 10. Pers TH, Hansen NT, Lage K, et al.: Meta-analysis of heterogeneous data sources for genome-scale identification of risk genes in complex phenotypes. Genet Epidemiol. 2011; 35(5): 318-32. PubMed Abstract | Publisher Full Text 11. Piñero J, Queralt-Rosinach N, Bravo À, et al.: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015; 2015: bav028. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 12. Brookes AJ, Robinson PN: Human genotype-phenotype databases: aims, challenges and opportunities. Nat Rev Genet. 2015; 16(12): 702-15. PubMed Abstract | Publisher Full Text | F1000 Recommendation 13. Weinreich SS, Mangon R, Sikkens JJ, et al.: Orphanet: een Europese database over zeldzame ziekten. Ned Tijdschr Geneeskd. 2008; 152(9): 518-9. Reference Source 14. Hamosh A, Scott AF, Amberger J, et al.: Online Mendelian Inheritance in Man (OMIM). Hum Mutat. 2000; 15(1): 57-61. PubMed Abstract | Publisher Full Text 15. Welter D, MacArthur J, Morales J, et al.: The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014; 42(Database issue): D1001-6. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 16. Hakenberg J, Cheng WY, Thomas P, et al.: Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts. BMC Bioinformatics. 2016; 17: 24. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 17. Smedley D, Jacobsen JO, Jäger M, et al.: Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015; 10(12): 2004-15. PubMed Abstract | Publisher Full Text | F1000 Recommendation 18. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009; 4(7): 1073-81. PubMed Abstract | Publisher Full Text 19. Adzhubei I, Jordan DM, Sunyaev SR: Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013; Chapter 7: Unit7.20. PubMed Abstract | Publisher Full Text | Free Full Text 20. Choi Y, Chan AP: PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015; 31(16): 2745-7. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 21. Kircher M, Witten DM, Jain P, et al.: A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014; 46(3): 310-5. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 22. Burger JD, Doughty E, Khare R, et al.: Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database (Oxford). 2014; 2014: pii: bau094. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 23. Singhal A, Simmons M, Lu Z: Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol. 2016; 12(11): e1005017. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 24. Wei CH, Kao HY, Lu Z: GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. Biomed Res Int. 2015; 2015: 918710. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 25. Hoehndorf R, Schofield PN, Gkoutos GV: Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Sci Rep. 2015; 5: 10888. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 26. Hettne KM, Thompson M, van Haagen HH, et al.: The Implicitome: A Resource for Rationalizing Gene-Disease Associations. PLoS One. 2016; 11(2): e0149621. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 27. Wei CH, Harris BR, Kao HY, et al.: tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013; 29(11): 1433-9. PubMed Abstract | Publisher Full Text | Free Full Text 28. Leaman R, Islamaj Dogan R, Lu Z: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics. 2013; 29(22): 2909-17. PubMed Abstract | Publisher Full Text | Free Full Text 29. Wei CH, Kao HY: Cross-species gene normalization by species inference. BMC Bioinformatics. 2011; 12(Suppl 8): S5. PubMed Abstract | Publisher Full Text | Free Full Text 30. Wei CH, Leaman R, Lu Z: SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine. ACM BCB. 2014; 2014: 138-46. PubMed Abstract | Publisher Full Text | Free Full Text 31. Wei CH, Kao HY, Lu Z: SR4GN: a species recognition software tool for gene normalization. PLoS One. 2012; 7(6): e38460. PubMed Abstract | Publisher Full Text | Free Full Text PubMed Abstract | Publisher Full Text | Free Full Text 44. Kibbe WA, Arze C, Felix V, et al.: Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 2015; 43(Database issue): D1071-8. PubMed Abstract | Publisher Full Text | Free Full Text 45. Kohler S, Vasilevsky NA, Engelstad M, et al.: The Human Phenotype Ontology in 2017. Nucleic Acids Res. 2017; 45(D1): D865-D876. PubMed Abstract | Publisher Full Text | Free Full Text 46. Amberger J, Bocchini CA, Scott AF, et al.: McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009; 37(Database issue): D793-6. PubMed Abstract | Publisher Full Text | Free Full Text 47. Davis AP, Murphy CG, Johnson R, et al.: The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 2013; 41(Database issue): D1104-14. PubMed Abstract | Publisher Full Text | Free Full Text 48. Landrum MJ, Lee JM, Benson M, et al.: ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016; 44(D1): D862-8. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 49. Blake JA, Bult CJ, Kadin JA, et al.: The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 2011; 39(Database issue): D842-8. PubMed Abstract | Publisher Full Text | Free Full Text 50. Twigger S, Lu J, Shimoyama M, et al.: Rat Genome Database (RGD): mapping disease onto the genome. Nucleic Acids Res. 2002; 30(1): 125-8. PubMed Abstract | Publisher Full Text | Free Full Text 51. Bravo A, Cases M, Queralt-Rosinach N, et al.: A knowledge-driven approach to extract disease-related biomarkers from the literature. Biomed Res Int. 2014; 2014: 253128. PubMed Abstract | Publisher Full Text | Free Full Text 52. Becker KG, Barnes KC, Bright TJ, et al.: The genetic association database. Nat Genet. 2004; 36(5): 431-2. PubMed Abstract | Publisher Full Text 53. Lek M, Karczewski KJ, Minikel EV, et al.: Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536(7616): 285-91. PubMed Abstract | Publisher Full Text | Free Full Text | F1000 Recommendation 54. Gray KA, Yates B, Seal RL, et al.: Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015; 43(Database issue): D1079-85. PubMed Abstract | Publisher Full Text | Free Full Text 55. Hettne KM, Stierum RH, Schuemie MJ, et al.: A dictionary to identify small molecules and drugs in free text. Bioinformatics. 2009; 25(22): 2983-91. PubMed Abstract | Publisher Full Text
doi:10.12688/f1000research.10788.1 pmid:28529714 pmcid:PMC5414807 fatcat:hzph7zrm55ecje7fsp2yfk66uq