A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
Exploiting Image Contents in Web Search
International Joint Conference on Artificial Intelligence
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belong to different classes in many cases since any pair of documents shares lots of "general" words. Second, the sparsity of class-specific "core" words leads to grouping documents with the same class labels into different clusters. Both problems can be resolved by suitable smoothing of document model and usingdblp:conf/ijcai/ZhouD07 fatcat:vemop4hxsfdufafkqladgjjzz4