BicPAM: Pattern-based biclustering for biomedical data analysis

Rui Henriques, Sara C Madeira
2014 Algorithms for Molecular Biology  
Biclustering, the discovery of sets of objects with a coherent pattern across a subset of conditions, is a critical task to study a wide-set of biomedical problems, where molecular units or patients are meaningfully related with a set of properties. The challenging combinatorial nature of this task led to the development of approaches with restrictions on the allowed type, number and quality of biclusters. Contrasting, recent biclustering approaches relying on pattern mining methods can
more » ... vely discover flexible structures of robust biclusters. However, these approaches are only prepared to discover constant biclusters and their underlying contributions remain dispersed. Methods: The proposed BicPAM biclustering approach integrates existing principles made available by state-of-the-art pattern-based approaches with two new contributions. First, BicPAM is the first efficient attempt to exhaustively mine non-constant types of biclusters, including additive and multiplicative coherencies in the presence or absence of symmetries. Second, BicPAM provides strategies to effectively compose different biclustering structures and to handle arbitrary levels of noise inherent to data and with discretization procedures. Results: Results show BicPAM's superiority against its peers and its ability to retrieve unique types of biclusters of interest, to efficiently deliver exhaustive solutions and to successfully recover planted biclusters in datasets with varying levels of missing values and noise. Its application over gene expression data leads to unique solutions with heightened biological relevance. Conclusions: BicPAM approaches integrate existing disperse efforts towards pattern-based biclustering and provides the first critical strategies to efficiently discover exhaustive solutions of biclusters with shifting, scaling and symmetric assumptions with varying quality and underlying structures. Additionally, BicPAM dynamically adapts its behavior to mine data with different levels of missing values and noise. Definition 2. Let L be a finite set of items, and P be an itemset P ⊆ L. A transaction t is a pair (t id , P) with id ∈ N. An itemset database D over L is a finite set of transactions {t 1 , .., t n }. Definition 3. A transaction (t id , P) contains P , denoted P ⊆ (t id , P), if P ⊆ P. The coverage P of an itemset P is the set of all transactions in D in which the itemset P occurs: P = {t ∈ D | P ⊆ t}. The support of an itemset P in D, denoted sup P , can either be absolute, being its coverage size | P |, or a relative threshold given by | P |/|D|. Definition 4. Given an itemset database D and a minimum support threshold θ, the frequent itemset mining (FIM) problem consists of computing the set {P | P ⊆ L, sup P ≥ θ}.
doi:10.1186/s13015-014-0027-z pmid:25649207 pmcid:PMC4302537 fatcat:ucjj6bxwxjfodipiwpz5qjzwg4