SEMI SUPERVISED CLUSTERING ENSEMBLE FOR HIGH DIMENSIONAL DATA

M Pavithra, Parvathi, Dean-Pg Studies
2017 unpublished
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the
more » ... labels of some observations may be known, or certain observations may be known to belong to the same cluster. Inother cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The objective of cluster analysis is to partition a data set into a group of subsets (i.e. "clusters") such that observations within a cluster are more similar to one another than observations in other clusters. For a more detailed discussion, see Hastie et al or Gordon. Traditional clustering methods are unsupervised, meaning that there is no outcome measure and nothing is known about the relationship between the observations in the data set. However, in many situations one may wish to perform cluster analysis even though an outcome variable exists or some preliminary information about the clusters is known.
fatcat:dguflurgunejldq7wl2rb2cduq