Contrasting Subgroup Discovery

L. Langohr, V. Podpecan, M. Petek, I. Mozetic, K. Gruden, N. Lavrac, H. Toivonen
2012 Computer journal  
Subgroup discovery methods find interesting subsets of objects of a given class. Motivated by an application in bioinformatics, we first define a generalized subgroup discovery problem. In this setting, a subgroup is interesting if its members are characteristic for their class, even if the classes are not identical. Then we further refine this setting for the case where subsets of objects, for example, subsets of objects that represent different time points or different phenotypes, are
more » ... ed. We show that this allows finding subgroups of objects that could not be found with classical subgroup discovery. To find such subgroups, we propose an approach that consists of two subgroup discovery steps and an intermediate, contrast set definition step. This approach is applicable in various application areas. An example is biology, where interesting subgroups of genes are searched by using gene expression data. We address the problem of finding enriched gene sets that are specific for virus-infected samples for a specific time point or a specific phenotype. We report on experimental results on a time series dataset for virus-infected Solanum tuberosum (potato) plants. The results on S. tuberosum's response to virus-infection revealed new research hypotheses for plant biologists.
doi:10.1093/comjnl/bxs132 fatcat:btjxrfhgjffjrl7pkcg5jrufnu