Filters








1,280 Hits in 2.5 sec

Clustering XML Documents Using Frequent Subtrees [chapter]

Sangeetha Kutty, Tien Tran, Richi Nayak, Yuefeng Li
2009 Lecture Notes in Computer Science  
The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents.  ...  The closed frequent subtrees are then used to extract the constrained content from the documents.  ...  In order to cluster the XML documents, we have used content corresponding to the frequent subtrees in a given document and have generated a terms by document matrix.  ... 
doi:10.1007/978-3-642-03761-0_45 fatcat:3jopojjmmnhapkqyixqezs75ie

Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach [chapter]

Sangeetha Kutty, Tien Tran, Richi Nayak, Yuefeng Li
Lecture Notes in Computer Science  
This matrix is used to progressively cluster the XML documents.  ...  Using the structural information of the XML documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents.  ...  In order to cluster the XML documents, we have used a frequent subtree -document matrix generated from closed frequent subtrees.  ... 
doi:10.1007/978-3-540-85902-4_17 fatcat:2iskuu7dzzb2bkcqjpw2rbltha

XCFS

Sangeetha Kutty, Richi Nayak, Yuefeng Li
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09  
This paper introduces a novel approach that first determines structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML  ...  An XML clustering algorithm should process both structural and content information of XML documents in order to improve the accuracy and meaning of the clustering solution.  ...  Unlike the clustering of text documents, XML document clustering is an intricate process.  ... 
doi:10.1145/1645953.1646216 dblp:conf/cikm/KuttyNL09 fatcat:abj64ezia5ayhbrfd4mgteqmci

Utilising Semantic Tags in XML Clustering [chapter]

Sangeetha Kutty, Richi Nayak, Yuefeng Li
2010 Lecture Notes in Computer Science  
This techique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents.  ...  This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track.  ...  of our previous work, Hybrid Clustering of XML documents(HCX) [7] .This method utilises frequent subtrees extracted from the structure of XML documents to obtain the content in order to cluster the documents  ... 
doi:10.1007/978-3-642-14556-8_41 fatcat:pjdutlxb3zasbat2ds4qogslim

XML Documents Clustering Using Tensor Space Model -- A Preliminary Study

Sangetha Kutty, Richi Nayak, Yuefeng Li
2010 2010 IEEE International Conference on Data Mining Workshops  
A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML.  ...  Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering.  ...  The clusters of subtrees, called Closed Frequent Induced Subtree Cluster ( ), become a tensor dimension for representing and analyzing XML documents.  ... 
doi:10.1109/icdmw.2010.106 dblp:conf/icdm/KuttyNL10 fatcat:xtmoxebrrfcfrb6d7ajuqmdc3u

Mining Maximal Frequently Changing Subtree Patterns from XML Documents [chapter]

Ling Chen, Sourav S. Bhowmick, Liang-Tien Chia
2004 Lecture Notes in Computer Science  
In this paper, we focus on the sequence of changes to the structures of an XML document to find out which subtrees in the XML structure frequently change together, which we call Frequently Changing Subtree  ...  Due to the dynamic nature of online information, XML documents typically evolve over time. The change of the data values or structures of an XML document may exhibit some particular patterns.  ...  -Structure-based Document Clustering: Clustering XML documents based on the structures embedded in documents is proposed in [12] .  ... 
doi:10.1007/978-3-540-30076-2_7 fatcat:66lxawzpfnbslcx4jzalvcx7ae

XML structural delta mining: Issues and challenges

Qiankun Zhao, Ling Chen, Sourav S. Bhowmick, Sanjay Madria
2006 Data & Knowledge Engineering  
Such knowledge can be useful in many applications such as change detection for very large XML documents, efficient XML indexing, and XML search engine etc.  ...  The idea of XML structural delta mining is to discover knowledge from sequences of structural changes to XML documents, which is also called XML structural delta.  ...  Applications for XML Structural Delta Association Structure-based Document Clustering: Clustering XML documents based on the structures embedded in documents is proposed in [31] .  ... 
doi:10.1016/j.datak.2005.10.002 fatcat:ynzathh2nbhufjjygmvt6hueva

XML Documents Clustering Using a Tensor Space Model [chapter]

Sangeetha Kutty, Richi Nayak, Yuefeng Li
2011 Lecture Notes in Computer Science  
This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering.  ...  The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents.  ...  The clusters of CF I subtrees, called Closed Frequent Induced Subtree Cluster (CF ISC), become a tensor order for representing and analyzing XML documents.  ... 
doi:10.1007/978-3-642-20841-6_40 fatcat:gmdrv3onzrcqbesezozsn7zkdu

Clustering XML documents by patterns

Maciej Piernik, Dariusz Brzezinski, Tadeusz Morzy
2015 Knowledge and Information Systems  
Now that the use of XML is prevalent, methods for mining semi-structured documents have become even more important.  ...  In this paper, we study clustering algorithms, which use patterns to cluster documents without the need for pairwise comparisons.  ...  PathXP uses groups of frequent paths, called profiles, to cluster XML documents in a divisive manner.  ... 
doi:10.1007/s10115-015-0820-0 fatcat:em77rbj3m5d4fcac5pxbfri3uy

Mining Positive and Negative Association Rules from XML Query Patterns for Caching [chapter]

Ling Chen, Sourav S. Bhowmick, Liang-Tien Chia
2005 Lecture Notes in Computer Science  
We cluster XML queries according to their semantics first and then mine association rules between the clusters.  ...  However, frequent XML query patterns mined by these approaches ignore the temporal sequence between user queries.  ...  Initializing Clusters In this step, we construct the initial clusters. We use the frequent rooted subtrees as the labels of the initial clusters.  ... 
doi:10.1007/11408079_67 fatcat:a5jxnucxvjckvjej2fbb3p5y34

The importance of sibling clustering for efficient bulkload of XML document trees

C. C. Kanne, G. Moerkotte
2006 IBM Systems Journal  
In an XML Data Store (XDS), importing documents from external sources is a very frequent operation.  ...  This involves two major subtasks: (1) Partitioning the documents' logical tree structure into subtrees smaller than a disk page in a way that is both space-efficient an suitable for later processing. (  ...  In contrast, an XDS needs to support document imports as a regular operation which is used very frequently by applications.  ... 
doi:10.1147/sj.452.0321 fatcat:ojam2th4ajgjfnydoftypozkhi

Evaluating Structural Similarity in XML Documents

Andrew Nierman, H. V. Jagadish
2002 International Workshop on the Web and Databases  
Given a collection of documents derived from multiple DTDs, we can compute pair-wise distances between documents in the collection, and then use these distances to cluster the documents.  ...  XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML.  ...  cluster C i and doc Ci k is the k th XML document in the cluster C i .  ... 
dblp:conf/webdb/NiermanJ02 fatcat:soyzch5o7rerdnebgskn54ligu

FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents

Ling Chen, Sourav S. Bhowmick, Liang-Tien Chia
2006 Data & Knowledge Engineering  
Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc.  ...  A discovered FRACTURE is a set of substructures of an XML document that frequently change together.  ...  Knowledge obtained from FRACTURE s can be useful in applications such as XML indexing, XML clustering etc.  ... 
doi:10.1016/j.datak.2005.09.002 fatcat:boz2toeej5gk5dw5k3npl76vgq

XML clustering: a review of structural approaches

Maciej Piernik, Dariusz Brzezinski, Tadeusz Morzy, Anna Lesniewska
2014 Knowledge engineering review (Print)  
In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality.  ...  A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches.  ...  Acknowledgement The authors wish to thank the editor and the anonymous reviewers for their useful comments and suggestions.  ... 
doi:10.1017/s0269888914000216 fatcat:icmzquio7vee7eaqwtabejm7ne

Effective XQuery keyword using XML query processing

E. Seshatheri, T. Bhuvaneswari
2018 Indonesian Journal of Electrical Engineering and Computer Science  
This paper proposes the query answering system of Linear search usingwild card searchfor extracting the frequent pattern to maximize your search results in library database on XML document to extract the  ...  <span>The data has structured is determined using the standard is known as XML whereaslarge amount of data has consumed through internet consist of the both structural data format as well as semi structural  ...  The interesting patterns among the subtrees of the given XML document can be identified.  ... 
doi:10.11591/ijeecs.v14.i1.pp450-454 fatcat:tgicdcipvbbszaigd276ayeslu
« Previous Showing results 1 — 15 out of 1,280 results