Compact Integration of Multi-Network Topology for Functional Analysis of Genes
Graphical Abstract Highlights d We learn compact features of topology from multiple heterogeneous networks d Our features obtain state-of-the-art accuracy in diverse functional inference tasks d Our method scales to many networks and can be broadly applied to network science Correspondence email@example.com (B.B.), firstname.lastname@example.org (J.P.) In Brief Mashup is a computational approach for integrating data across multiple networks by compactly representing the topological relationships between
... SUMMARY The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using lowdimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains. We performed 5-fold cross-validation to compare the function prediction performance of Mashup to other state-of-the-art network integration methods, GeneMANIA and STRING's Bayesian integration followed by a diffusion-based function prediction method DSD (STRING-DSD) in (A) human and (B) yeast. A precision-recall curve for each method is shown (C). Additional figures, including the results on molecular function (MF) and cellular component (CC) ontologies in human and further comparisons to other integration approaches, are provided in Figures S1, S2 , and S3. Performance is measured by the fraction of top predictions correctly labeled (Acc), harmonic mean of precision and recall when the top three predictions are assigned to each gene (F1), and the area under the precision recall curve summarized over all labels, both under the micro-averaging (m-PR) and macro-averaging (M-PR) schemes. Results are summarized over ten trials (SD shown as error bars), and asterisks represent where Mashup's improvement over GeneMANIA is significant (one-sided rank-sum p value <0.01). Overall, Mashup achieves substantially greater predictive performance over previous methods. Cell Systems 3, 1-9, December 21, 2016 3 Please cite this article in press as: Cho et al.