Makingk-means even faster [chapter]

Greg Hamerly
2010 Proceedings of the 2010 SIAM International Conference on Data Mining  
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. In this paper, we propose a new acceleration for exact k-means that gives the same answer, but is much faster in practice. Like Elkan's accelerated algorithm [8], our algorithm avoids distance computations using distance bounds and the triangle inequality. Our algorithm uses one novel lower bound for point-center distances, which allows it to eliminate the innermost k-means loop 80% of the time or
more » ... in our experiments. On datasets of low and medium dimension (e.g. up to 50 dimensions), our algorithm is much faster than other methods, including methods based on low-dimensional indexes, such as k-d trees. Other advantages are that it is very simple to implement and it has a very small memory overhead, much smaller than other accelerated algorithms.
doi:10.1137/1.9781611972801.12 dblp:conf/sdm/Hamerly10 fatcat:5d3ulgfynfb2bgvorezkm6szw4