Language Modelling of Constraints for Text Clustering [chapter]

Javier Parapar, Álvaro Barreiro
2012 Lecture Notes in Computer Science  
Constrained clustering is a recently presented family of semisupervised learning algorithms. These methods use domain information to impose constraints over the clustering output. The way in which those constraints (typically pair-wise constraints between documents) are introduced is by designing new clustering algorithms that enforce the accomplishment of the constraints. In this paper we present an alternative approach for constrained clustering where, instead of defining new algorithms or
more » ... ective functions, the constraints are introduced modifying the document representation by means of their language modelling. More precisely the constraints are modelled using the well-known Relevance Models successfully used in other retrieval tasks such as pseudo-relevance feedback. To the best of our knowledge this is the first attempt to try such approach. The results show that the presented approach is an effective method for constrained clustering even improving the results of existing constrained clustering algorithms.
doi:10.1007/978-3-642-28997-2_30 fatcat:nh3xkas2gfe6dlgdpgql4jzgfa