Bottom-Up Biclustering of Expression Data

Kenneth Bryan, Padraig Cunningham
2006 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology  
In a gene expression data matrix a bicluster is a sub-matrix of genes and conditions that exhibits a high correlation of expression activity across both rows and columns. The premise behind biclustering is that even related genes may only be expressed in a synchronized fashion over certain conditions. Conventional clustering groups over all features and may not capture these local relationships. Biclustering has the potential to retrieve these local signals and also to model overlapping groups
more » ... overlapping groups of genes. These factors allow better representation of the natural state of functional modules in the cell. The mean squared residue is a popular measure of bicluster quality. One drawback however is that it is biased toward flat biclusters with low row variance. In this paper we introduce an improved bicluster score that removes this bias and promotes the discovery the most significant biclusters in the dataset. We employ this score within a new biclustering approach based on the bottom-up search strategy. We believe that the bottom-up search approach better models the underlying functional modules of the gene expression dataset. We evaluate our new score against the mean squared residue score using a yeast cell cycle expression dataset. We then carry out a comparative analysis of our biclustering technique against previously published clustering and biclustering approaches. Lastly, we use the biclusters discovered by our method to attempt to putatively annotate unclassified genes.
doi:10.1109/cibcb.2006.330995 fatcat:3nidgcvumfaqvg7uflsup3tx6u