Network module detection: Affinity search technique with the multi-node topological overlap measure

Ai Li, Steve Horvath
2009 BMC Research Notes  
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological
more » ... rlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings: We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion: Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/ MTOM/ Findings While most clustering procedures use a pairwise dissimilarity (distance) measure as input, we present a clustering procedure that can accommodate a multi-point dissimilarity measure d(i 1 , i 2 , ..., i P ) where P > 1 is the number of points and the indices i k = 1, ..., n run over the n objects. Since we are mainly interested in a network application, we will refer to the objects as nodes and the corresponding measure as multi-node dissimilarity. A multi-node (P-point) dissimilarity measure d(i 1 , i 2 , ..., i P ) is defined to satisfy the following properties: i) it takes on non-negative values, i.e.
doi:10.1186/1756-0500-2-142 pmid:19619323 pmcid:PMC2727520 fatcat:csncrstb7nahbjwsi2646wepza