Noise Resistant Generalized Parametric Validity Index of Clustering for Gene Expression Data

Rui Fa, Asoke K. Nandi
2014 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters a and b to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of
more » ... est advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements.
doi:10.1109/tcbb.2014.2312006 pmid:26356344 fatcat:lka3tsisqvdubfz67ykc2zueje