ProUM: Projection-based Utility Mining on Sequence Data [article]

Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, Philip S. Yu
2019 arXiv   pre-print
Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they
more » ... e time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability.
arXiv:1904.07764v2 fatcat:e7y2nnnk4rbrrcwtupeizdnple