Efficient Frequent Pattern Mining on Web Logs [chapter]

Liping Sun, Xiuzhen Zhang
2004 Lecture Notes in Computer Science  
Mining frequent patterns from web log data can help to optimise the structure of a web site and improve the performance of web servers. Web users can also benefit from these frequent patterns. Many efforts have been done to mine frequent patterns efficiently. Candidate-generation-and-test approach (Apriori and its variants) and pattern-growth approach (FP-growth and its variants) are the two representative frequent pattern mining approaches. Neither candidate-generation-and-test approach nor
more » ... tern-growth approach is always good on web log data. We have conducted extensive experiments on real world web log data to analyse the characteristics of web logs and the behaviours of these two approaches on web log data. We propose a new algorithm -Combined Frequent Pattern Mining (CFPM) algorithm to cater for web log data specifically. We use some heuristics in web log data to prune search space and reduce many unnecessary operations in mining, so that better efficiency is achieved. Experimental results show that CFPM can significantly improve the performance of pattern-growth approach by 1.2˜7.8 times on frequent pattern mining on web log data.
doi:10.1007/978-3-540-24655-8_58 fatcat:yclxkac5xfckfhuftxmet75hzq