BICLUSTERING METHODS FOR RE-ORDERING DATA MATRICES IN SYSTEMS BIOLOGY, DRUG DISCOVERY AND TOXICOLOGY

Christodoulos A. Floudas
2010 Croatian Operational Research Review  
Biclustering has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters. In the first part of the presentation, we present a rigorous approach to
more » ... ng, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric [1,2]. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. The performance of OREO is tested on several important data matrices arising in systems biology to validate the ability of the proposed method and compare it to existing biclustering and clustering methods. In the second part of the talk, we will focus on novel methods for clustering of data matrices that are very sparse [3]. These types of data matrices arise in drug discovery where the x- and y-axis of a data matrix can correspond to different functional groups for two distinct substituent sites on a molecular scaffold. Each possible x and y pair corresponds to a single molecule which can be synthesized and tested for a certain property, such as percent inhibition of a protein function. For even moderate size matrices, synthesizing and testing a small fraction of the molecules is labor intensive and not economically feasible. Thus, it is of paramount importance to have a reliable method for guiding the synthesis process to select molecules that have a high probability of success. In the second part of the presentation, we introduce a new strategy to enable efficient substituent reordering and descriptor-free property estimation. Our approach casts substituent reordering as a special high-dimensional rearrangement clusterin [...]
doaj:987f5dab10e14572987f5c90a34874fe fatcat:xbdwvqxp7fco5mvdl3ugffz5v4