Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge [chapter]

Kiran G.V.R., Ravi Shankar, Vikram Pudi
<span title="">2010</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient
more &raquo; ... solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score on standard datasets and show our results to be better than existing approaches.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1007/978-3-642-15390-7_2</a> <a target="_blank" rel="external noopener" href="">fatcat:nadxp6mh75az5kjtemoagwxbgq</a> </span>
