Optimally Orienting Physical Networks

Dana Silverbush, Michael Elberfeld, Roded Sharan
2011 Journal of Computational Biology  
In a network orientation problem one is given a mixed graph, consisting of directed and undirected edges, and a set of source-target vertex pairs. The goal is to orient the undirected edges so that a maximum number of pairs admit a directed path from the source to the target. This problem is NP-complete and no approximation algorithms are known for it. It arises in the context of analyzing physical networks of protein-protein and protein-DNA interactions. While the latter are naturally directed
more » ... from a transcription factor to a gene, the direction of signal flow in protein-protein interactions is often unknown or cannot be measured en masse. One then tries to infer this information by using causality data on pairs of genes such that the perturbation of one gene changes the expression level of the other gene. Here we provide a first polynomial-size ILP formulation for this problem. We apply our algorithm to orient protein-protein interactions in yeast and measure our performance using edges with known orientations. We find that our algorithm achieves high accuracy and coverage in the orientation, outperforming simplified versions that do not use information on edge directions. The obtained orientations can lead to better understanding of the structure and function of the network. These authors contributed equally to this work. High-throughoutput technologies are routinely used nowadays to detect physical interactions in the cell, including chromatin immuno-precipitation experiments for measuring protein-DNA interactions (PDIs) [11] , and yeast two-hybrid assays [7] and co-immunoprecipitation screens [9] for measuring protein-protein interactions (PPIs). These networks serve as the scaffold for signal processing in the cell and are, thus, key to understanding cellular response to different genetic or environmental cues. While PDIs are naturally directed (from a transcription factor to its regulated genes), PPIs are not. Nevertheless, many PPIs transmit signals in a directional fashion, with kinase-substrate interactions (KPIs) being one of the prime examples. These directions are vital to understanding signal flow in the cell, yet they are not measured by most current techniques. Instead, one tries to infer these directions from perturbation experiments. In these experiments, a gene (cause) is perturbed and as a result other genes change their expression levels (effects). Assuming that each cause-effect pair should be connected by a directed pathway in the physical network, one can predict an orientation (direction assignments) to the undirected part of the network that will best agree with the cause-effect information. The resulting combinatorial problem can be formalized by representing the network as a mixed graph, where undirected edges model interactions with unknown causal direction, and directed edges represent interactions with known directionality. The cause-effect pairs are modeled by a collection of source-target vertex pairs. The goal is to orient (assign single directions to) the undirected edges so that a maximum number of source-target pairs admit a directed path from the source to the target. Previous work on this and related problems can be classified into theoretical and applied work. On the theoretical side, Arkin and Hassin [1] studied the decision problem of orienting a mixed graph and showed that this problem is NP-complete. Decision and optimization versions of the problems of finding reachability preserving orientations are well studied for the case where the set of vertex pairs contains all vertex pairs from the graph [16, 3, 6, 5, 10] . For a comprehensive discussion of the various kinds of graph orientations (not necessarily reachability preserving), we refer to the textbook of Bang-Jensen and Gutin [2] . For the special case of an undirected network (with no pre-directed edges), the orientation problem was shown to be NP-complete and hard to approximate to within a constant factor of 11/12 [13]. On the positive side, Medvedovsky et al. [13] provided an ILP-based algorithm, and showed that the problem is approximable to within a ratio of O(log n), where n is the number of vertices in the network. The approximation ratio was recently improved to O(log n/ log log n) [8] . The authors considered also the more general problem on mixed graphs, but the polylogarithmic approximation ratio attained was not satisfying as its power depends on some properties of the actual paths. On the practical side, several authors studied the orientation problem and related annotation problems using statistical approaches [18, 14] . However, these approaches rely on enumerating all paths up to a certain length between a pair of nodes, making them infeasible on large networks. Our main contribution in this paper is a first efficient ILP formulation of the orientation problem on mixed graphs, leading to an optimal solution of the problem on current networks. We implemented our approach and applied it to a large data set of physical interactions and knockout pairs in yeast. We collected interaction and cause-effect pair information from different publications and integrated them into a physical network with 3,660 proteins, 4,000 PPIs, 4,095 PDIs, along with 53,809 knockout pairs among the molecular components of the network. We carried out a number of experiments to measure the accuracy of the orientations produced by our method for different input scenarios. In particular, we study how the portion of undirected interactions and the number of cause-effect pairs affect the orientations. We further compare our performance to that of two layman approaches that are based on orienting undirected networks, ignoring
doi:10.1089/cmb.2011.0163 pmid:21999286 fatcat:ykdvlozy75ekrjfrenajryq764