Jun Huan, Wei Wang, Jan Prins, Jiong Yang
2004 Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04  
One fundamental challenge for mining recurring subgraphs from semi-structured data is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set
more » ... in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude. Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that in addition to decreasing the output size our algorithm can achieve a significant speed up over the current state-of-the-art subgraph mining algorithms.
doi:10.1145/1014052.1014123 dblp:conf/kdd/HuanWPY04 fatcat:ahjzbnbcafandoodi3f6kfxix4