Utility-driven Mining of Contiguous Sequences [article]

Chunkai Zhang, Quanjian Dai, Zilin Du, Wensheng Gan, Jian Weng, Philip S. Yu
2021 arXiv   pre-print
Recently, contiguous sequential pattern mining (CSPM) gained interest as a research topic, due to its varied potential real-world applications, such as web log and biological sequence analysis. To date, studies on the CSPM problem remain in preliminary stages. Existing CSPM algorithms lack the efficiency to satisfy users' needs and can still be improved in terms of runtime and memory consumption. In addition, existing algorithms were developed to deal with simple sequence data, working with
more » ... one event at a time. Complex sequence data, which represent multiple events occurring simultaneously, are also commonly observed in real life. In this paper, we propose a novel algorithm, fast utility-driven contiguous sequential pattern mining (FUCPM), to address the CSPM problem. FUCPM adopts a compact sequence information list and instance chain structures to store the necessary information of the database and candidate patterns. For further efficiency, we develop the global unpromising items pruning and local unpromising items pruning strategies, based on sequence-weighted utilization and item-extension utilization, to reduce the search space. Extensive experiments on real-world and synthetic datasets demonstrate that FUCPM outperforms the state-of-the-art algorithms and is scalable enough to handle complex sequence data.
arXiv:2111.00247v1 fatcat:jywfqf3tyjhxrcilcnp4oo4ntu