A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Document Clustering with K-tree
[chapter]
2009
Lecture Notes in Computer Science
We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. ...
K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. ...
K-tree consistently found higher purity clusters than other submissions. Even with many small high purity clusters, K-tree achieved a high micro purity score. ...
doi:10.1007/978-3-642-03761-0_43
fatcat:ajzsw6lsyneljnhr7rksa3oxcq
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. ...
The K-tree has a low time complexity that is suitable for large document collections. ...
MEDOID K-TREE We propose an extension to K-tree where all cluster centres are document exemplars. This is inspired by the kmedoids algorithm [5] . ...
doi:10.1145/1571941.1572094
dblp:conf/sigir/VriesG09
fatcat:ztrmvtlabjdjzoe6aqizk3mnve
Random Indexing K-tree
[article]
2010
arXiv
pre-print
The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm. ...
Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. ...
K-tree and Document Clustering The K-tree algorithm is well suited to clustering large document collections due to its low time complexity. ...
arXiv:1001.0833v2
fatcat:hyarhomkhnbsrlfdxfahu7qreq
Parallel Streaming Signature EM-tree
2015
Proceedings of the 24th International Conference on World Wide Web - WWW '15
The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. ...
Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. ...
We observed this with EM-tree in prior work [12] . The EMtree can be seeded with k-means||. ...
doi:10.1145/2736277.2741111
dblp:conf/www/VriesVGN15
fatcat:htqbtmzvebbaxfng5benjvjpse
Clustering with Random Indexing K-tree and XML Structure
[chapter]
2010
Lecture Notes in Computer Science
The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies. ...
The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. ...
The RI projection produces dense document vectors that work well with the K-tree algorithm. Cluster quality has been measured with two metrics this year. ...
doi:10.1007/978-3-642-14556-8_40
fatcat:t3kosfq3jreuhnjcnggffevsoa
Unsupervised style classification of document page images
2005
IEEE International Conference on Image Processing 2005
Finally, the K-medoids algorithm is used to find an optimal grouping of the trees into K clusters, each of which corresponds to a distinct document style. ...
We evaluate our algorithm on test datasets with different cluster sizes and degrees of style similarity. ...
Finally, the K-medoids algorithm is used to find an optimal grouping of the trees into K clusters, each of which corresponds to a distinct document style. ...
doi:10.1109/icip.2005.1530104
dblp:conf/icip/MaoNT05
fatcat:dmhiqnadizgini2dwrrecg5v2q
An Analytical Assessment on Document Clustering
2012
International Journal of Computer Network and Information Security
Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. ...
Index Terms-Data mining, Document clustering, Suffix Tree Clustering (STC) steps, K-means, Agglomerative Hierarchical Clustering (AHC), cosine similarity
I.INTRODUCTION Clustering is raised from the ...
SUFFIX TREE CLUSTERING ALGORITHM Suffix Tree Clustering [1, 12] uses the concept of document clustering for clustering the documents. ...
doi:10.5815/ijcnis.2012.05.08
fatcat:4nt2ryrsbjetzfpbzccloqwnmq
Hierarchical Document Clustering Using Frequent Itemsets
[chapter]
2003
Proceedings of the 2003 SIAM International Conference on Data Mining
Frequent itemsets are also used to produce a hierarchical topic tree for clusters. By focusing on frequent items, the dimensionality of the document set is drastically reduced. ...
The intuition of our clustering criterion is that each cluster is identified by some common words, called frequent itemsets, for the documents in the cluster. ...
Acknowledgment The initial phase of this work benefited considerably from extensive discussions with Leo Chen and Linda Wu. ...
doi:10.1137/1.9781611972733.6
dblp:conf/sdm/FungWE03
fatcat:cb6xz4azhba7nfvzvo2poohuvm
Clustering XML Documents by Structure
[chapter]
2004
Lecture Notes in Computer Science
Modeling the XML documents as rooted ordered labeled trees, we explore the application of clustering algorithms using distances that estimate the similarity between those trees in terms of the hierarchical ...
This paper presents a framework for clustering XML documents by structure. ...
Clustering XML documents We deal with the problem of clustering XML documents using 1. structural summaries of their representative rooted ordered labeled trees, 2. tree edit distances between these summaries ...
doi:10.1007/978-3-540-24674-9_13
fatcat:6teg5mmjajcjbizz3llq6li2ky
A methodology for clustering XML documents by structure
2006
Information Systems
Modeling the XML documents as rooted ordered labeled trees, we explore the application of clustering algorithms using distances that estimate the similarity between those trees in terms of the hierarchical ...
This paper presents a framework for clustering XML documents by structure. ...
Clustering XML documents We deal with the problem of clustering XML documents using 1. structural summaries of their representative rooted ordered labeled trees, 2. tree edit distances between these summaries ...
doi:10.1016/j.is.2004.11.009
fatcat:pxvevu7vafevtm4f5oich2cvim
Clustering XML Documents Using Structural Summaries
[chapter]
2004
Lecture Notes in Computer Science
Modeling XML documents with tree-like structures, we face the 'clustering XML documents by structure' problem as a 'tree clustering' problem, exploiting distances that estimate the similarity between those ...
This work presents a methodology for grouping structurally similar XML documents using clustering algorithms. ...
. , D c k for every cluster C 1 , C 2 , . . . , C k , using the XML documents assigned to that cluster 8 . ...
doi:10.1007/978-3-540-30192-9_54
fatcat:5y4s7zxbnva4pjupwcxjord7ji
Efficient retrieval of the top-k most relevant spatial web objects
2009
Proceedings of the VLDB Endowment
Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. ...
This paper proposes a new indexing framework for locationaware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. ...
The authors thank Xin Cao for pre-processing the text documents used in the experiments. ...
doi:10.14778/1687627.1687666
fatcat:gxltjvze55cbrggijk7idz252y
Topological Tree for Web Organisation, Discovery and Exploration
[chapter]
2004
Lecture Notes in Computer Science
Each chain fully adapts to a specific topic, where its number of subtopics is determined using entropy-based validation and cluster tendency schemes. ...
The tree is generated using an algorithm called Automated Topological Tree Organiser, which uses a set of hierarchically organised selforganising growing chains. ...
Fig. 2 . 2 ATTO Topological Tree (left) and bisecting k-means binary tree (right)
Fig. 1. ...
doi:10.1007/978-3-540-28651-6_70
fatcat:l74vfotkffgkxhi3etfzbwb2ye
XML Data Integration Based on Content and Structure Similarity Using Keys
[chapter]
2008
Lecture Notes in Computer Science
Second, we measure the similarity degree based on data and structures of the two XML documents. ...
This paper proposes a technique for approximately matching XML data based on the content and structure by detecting the similarity of subtrees clustered semantically using leaf-node parents. ...
SLAX divides XML documents into smaller portions by parsing XML documents into K document trees. ...
doi:10.1007/978-3-540-88871-0_35
fatcat:evmlsnddqbfarkphvdvppke3j4
Multisets and Clustering XML Documents
2007
19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007)
We use operations on multisets of paths of document trees to define certain metrics on multisets. ...
These metrics are used for clustering real and synthesized XML documents to produce high-quality clusterings. ...
We model an XML document as a labeled rooted tree and represent the rooted labeled paths -a sequence of nodes of the tree starting with the root of the tree and ending with a leaf node -of the tree as ...
doi:10.1109/ictai.2007.18
dblp:conf/ictai/IyerS07
fatcat:2ibrazpm5bcudaxogeh5ffzfxi
« Previous
Showing results 1 — 15 out of 128,350 results