A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Hierarchical Document Clustering Using Frequent Itemsets
[chapter]
2003
Proceedings of the 2003 SIAM International Conference on Data Mining
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Another requirement is hierarchical clustering where clustered documents can be browsed according to the increasing specificity of topics. In this paper, we propose to use the notion of frequent itemsets,
doi:10.1137/1.9781611972733.6
dblp:conf/sdm/FungWE03
fatcat:cb6xz4azhba7nfvzvo2poohuvm