An Interior Point Algorithm for Minimum Sum-of-Squares Clustering

O. du Merle, P. Hansen, B. Jaumard, N. Mladenovic
1999 SIAM Journal on Scientific Computing  
An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program in 0-1 variables. The resolution method combines an interior point algorithm, i.e., a weighted analytic center column generation
more » ... thod, with branch-and-bound. The auxiliary problem of determining the entering column (i.e., the oracle) is an unconstrained hyperbolic program in 0-1 variables with a quadratic numerator and linear denominator. It is solved through a sequence of unconstrained quadratic programs in 0-1 variables. To accelerate resolution, variable neighborhood search heuristics are used both to get a good initial solution and to solve quickly the auxiliary problem as long as global optimality is not reached. Estimated bounds for the dual variables are deduced from the heuristic solution and used in the resolution process as a trust region. Proved minimum sum-of-squares partitions are determined for the first time for several fairly large data sets from the literature, including Fisher's 150 iris. Introduction. Cluster analysis addresses the following general problem: Given a set of entities, find subsets, or clusters, which are homogeneous and/or well separated (Hartigan [25], Gordon [15], Kaufman and Rousseeuw [28] , Mirkin [36]). This problem has many applications in engineering, medicine, and both the natural and the social sciences. The concepts of homogeneity and separation can be made precise in many ways. Moreover, a priori constraints, or in other words a structure, can be imposed on the clusters. This leads to many clustering problems and even more algorithms. The most studied and used methods of cluster analysis belong to two categories: hierarchical clustering and partitioning. Hierarchical clustering algorithms give a hierarchy of partitions, which are jointly composed of clusters either disjoint or included one into the other. Those algorithms are agglomerative or, less often, divisive. In the first case, they proceed from an initial partition, in which each cluster contains a single entity, by successive merging of pairs of clusters until all entities are in the same one. In the second case, they proceed from an initial partition with all entities in the same cluster, by successive bipartitions of one cluster at a time until all entities are isolated, one in each cluster. The best partition is then chosen from the hierarchy of partitions obtained, usually in an informal way. A graphical representation of results, * such as a dendrogram or an espalier (Hansen, Jaumard, and Simeone [23]), is useful for that purpose. Hierarchical clustering methods use an objective (sometimes implicit) function locally, i.e., at each iteration. With the exception of the single linkage algorithm (Johnson [27], Gower and Ross [17] ) which maximizes the split of all partitions obtained (Delattre and Hansen [2]), hierarchical algorithms do not give optimal partitions for their criterion after several agglomerations or divisions. In contrast, partitioning algorithms assume given the number of clusters to be found (or use it as a parameter) and seek to optimize exactly or approximately an objective function. Among many criteria used in cluster analysis, the minimum sum of squared distances from each entity to the centroid of the cluster to which it belongs-or minimum sum-of-squares for short-is one of the most used. It is a criterion for both homogeneity and separation as minimizing the within-clusters sum-of-squares is equivalent to maximizing the between-clusters sum-of-squares. Both hierarchical and nonhierarchical procedures for minimum sum-of-squares clustering (MSSC) have long been used. Ward's [45] method is a hierarchical agglomerative one. It fits in Lance and Williams's [32] general scheme for agglomerative hierarchical clustering and can therefore be implemented in O(N 2 log N ), where N is the number of entities considered. Moreover, using chains of near-neighbors, an O(N 2 ) implementation can be obtained (Benzecri [1], Murtagh [38]). Divisive hierarchical clustering is more difficult. If the dimension m of the space to which the entities to be classified belong is fixed, a polynomial algorithm in O(N m+1 log N ) can be obtained (Hansen, Jaumard, and Mladenović [22]). In practice, problems with m = 2, N ≤ 20000; m = 3, N ≤ 1000; m = 4, N ≤ 200 can be solved in reasonable computing time. Otherwise, one must use heuristics. Postulating a hierarchical structure for the partitions obtained for MSSC is a strong assumption. In most cases direct minimization of the sum-of-squares criterion among partitions with a given number M of clusters appears to be preferable. This has traditionally been done with heuristics, the best known of which is KMEANS [33] (see, e.g., Gordon and Henderson [16] and Gordon [15] for surveys of these heuristics). KMEANS proceeds from an initial partition to local improvements by reassignment of one entity at a time and recomputation of the two centroids of clusters to which this entity belonged and now belongs, until stability is reached. The procedure is repeated a given number of times to obtain a good local optimum. It has long been known that entities in two clusters are separated by the hyperplane perpendicular to the line joining their centroids and intersecting it at its middle point; see, e.g., Gordon and Henderson [16]. This implies that an optimal partition corresponds to a Voronoi diagram. Such a property can be exploited in heuristics but does not lead to an efficient exact algorithm as enumeration of Voronoi diagrams is time-consuming, even in two-dimensional space (Inaba, Katoh, and Imai [26]). Not much work appears to have been devoted, until now, to exact resolution of MSSC. The problem was formulated mathematically by Vinod [44] and Rao [39] but little was done there for its resolution. Koontz, Narendra, and Fukunaga [31] propose a branch-and-bound algorithm which was refined by Diehr [3] . Bounds are obtained in two ways: First, the sum-of-squares for entities already assigned to the same cluster during the resolution is a lower bound. Second, the set of entities to be clustered may be divided into subsets of smaller size and the sum of the sum-of-squares for each of these subsets is also a lower bound. Using these bounds for all subsets but one and assigning the entities of the last is then done. After they are assigned, the process continues with entities of the second subset and so forth. The bounds used tend not to Downloaded 09/17/12 to Redistribution subject to SIAM license or copyright; see
doi:10.1137/s1064827597328327 fatcat:lze5at2vibaonfggemomkjvx3q