Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links [chapter]

M. Eduardo Ares, Javier Parapar, Álvaro Barreiro
2009 Lecture Notes in Computer Science  
In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and non-constrained algorithms for the given task.
doi:10.1007/978-3-642-04417-5_32 fatcat:ly5wkjqxhva7pfdewo6cawdc4e