Polynomial time approximation schemes for geometric k-clustering

R. Ostrovsky, Y. Rabani
Proceedings 41st Annual Symposium on Foundations of Computer Science  
We deal with the problem of clustering data points. Given n points in a larger set (for example, R d ) endowed with a distance function (for example, L 2 distance), we would like to partition the data set into k disjoint clusters, each with a \cluster center", so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high dimensional geometric settings, even for k = 2. We give
more » ... omial time approximation schemes for this problem in several settings, including the binary cube f0; 1g d with Hamming distance, and R d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in avor to the k-median prob- Telcordia Technologies, MCC-1C357B, lem (and the related facility location problem), which has been considered in graph-theoretic and xed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is xed, but the dimension is part of the input. Our algorithms are based on a dimension reduction construction for the Hamming cube, which may be of independent interest.
doi:10.1109/sfcs.2000.892123 dblp:conf/focs/OstrovskyR00 fatcat:crozde7xvzhdzavwwy4vod6wlu