An iterative strategy for pattern discovery in high-dimensional data sets

Chun Tang, Aidong Zhang
2002 Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02  
High-dimensional data representation in which each data item (termed target object) is described by many features, is a necessary component of many applications. For example, in DNA microarrays, each sample (target odject) is represented by thousands of genes as features. Pattern discovery of target objects presents interesting but also very challenging problems. The data sets are typically not task-specific, many features are irrelevant or redundant and should be pruned out or filtered for the
more » ... purpose of classifying target objects to find empirical pattern. Uncertainty about which features are relevant makes it difficult to construct an informative feature space. This paper proposes an iterative strategy for pattern discovery in high-dimensional data sets. In this approach, the iterative process consists of two interactive components: discovering patterns within target objects and pruning irrelevant features. The performance of the proposed method with various real data sets is also illustrated.
doi:10.1145/584796.584798 fatcat:hkwzbzfbqbeshc7mnrzdatqoem