Efficiently mining frequent trees in a forest

Mohammed J. Zaki
2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02  
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the
more » ... e and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.
doi:10.1145/775056.775058 fatcat:nq37ceqepjdrha7bpwceuvs4ma