Semi-supervised document clustering with dual supervision through seeding
Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12
Semi-supervised clustering algorithms for general problems use a small amount of labeled instances or pairwise instance constraints to aid the unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by associating it with a document or a cluster. Besides labeled documents, this paper also explores labeled features to generate cluster seeds to seed the unsupervised clustering. In this paper, we present a
... r, we present a unified framework in which one can use both labeled documents and features in terms of seeding clusters and refine this information using intermediate clusters. We introduce two methods of using labeled features to generate cluster seeds. Experimental results on several real-world data sets demonstrate that constraining the clustering by both documents and features seeding can significantly improve document clustering performance over random seeding and document only seeding.