Mong Li Lee, Liang Huai Yang, Wynne Hsu, Xia Yang
2002 Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02  
It is increasingly important to develop scalable integration techniques for the growing number of XML data sources. A practical starting point for the integration of large numbers of Document Type Definitions (DTDs) of XML sources would be to first find clusters of DTDs that are similar in structure and semantics. Reconciling similar DTDs within such a cluster will be an easier task than reconciling DTDs that are different in structure and semantics as the latter would involve more
more » ... . We introduce XClust, a novel integration strategy that involves the clustering of DTDs. A matching algorithm based on the semantics, immediate descendents and leaf-context similarity of DTD elements is developed. Our experiments to integrate real world DTDs demonstrate the effectiveness of the XClust approach.
doi:10.1145/584838.584841 fatcat:wlojjqf6ubdmzhlbojfy2vm63a