A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
[chapter]
2009
Lecture Notes in Computer Science
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same
doi:10.1007/978-3-642-03079-6_15
fatcat:wjozoujkbfbjjawoeiyqmetljq