APPLICATION OF GRAPH MODELS IN BIOINFORMATICS THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
Balázs Ligeti, Sándor Pongor
2016
unpublished
Biomedical sciences use a variety of data sources on drug molecules, genes, proteins, complete genomes sequences, diseases and scientific publications, etc. This system can be best pictured as a giant data-network linked together by physical, functional, logical and similarity relationships. A new hypothesis or discovery can be considered as a new link that can be deduced from the existing connections. For instance, interactions of two pharmacons-if not already known-represent a testable novel
more »
... ypothesis. Such implicit effects are especially important in complex diseases such as cancer. Currently huge amount of data is generated by experiments, such as whole genome sequencing of metagenomic data. Deriving new information, i.e. linking the experiments to microorganisms and supporting the new hypothesis with known data requires the proper analysis of a data-network. The goal of the investigations carried out in this thesis is to predict novel drug combinations or novel biomarkers using the network of existing oncological and protein interaction databases and to interpret and analysis large metagenomic data using network principles. I showed that the overlap of network neighborhoods is strongly correlated with the pairwise interaction strength of two pharmacons used in cancer therapy, and it is also well correlated with clinical data. The strategy based on the hypothesis that novel, implicit links can be discovered between the network neighborhoods of data items lead to the discovery of novel biomarkers based on text analysis. In 2012 I prioritized ten potential biomarkers for ovarian cancers, two of which were in fact described as such in the subsequent years. I showed that applying network principles and fast aligners in the evaluation of metagenomic whole genome sequencing experiments could improve the classification performance, and even sensitive detection of pathogens is possible. The strategy seems to hold promises in several applications including prioritization of new drug combinations, discovering of novel biomarkers for experimental testing or sensitive detection of pathogens. Its use is naturally limited by the sparsity and the quality of experimental data; however, these aspects are expected to improve given the development of current databases.
fatcat:plpze5ix65exbfbeuzudomzgdy