Directed Graph based Distributed Sequential Pattern Mining Using Hadoop MapReduce
International Journal on Recent and Innovation Trends in Computing and Communication
Usual sequential pattern mining algorithms experiences the scalability problem when trade with very big data sets. In existing systems like PrefixSpan, UDDAG major time is needed to generate projected databases like prefix and suffix projected database from given sequential database. In DSPM (Distributed Sequential Pattern Mining) Directed Graph is introduced to generate prefix and suffix projected database which reduces the execution time for scanning large database. In UDDAG, for each unique
... d UDDAG is created to find next level sequential patterns. So it requires maximum storage for each UDDAG. In DSPM single directed graph is used to generate projected database and finding patterns. To improve the scanning time and scalability problem we introduce a distributed sequential pattern mining algorithm on Hadoop platform using MapReduce programming model. We use transformed database to reduce scanning time and directed graph to optimize the memory storage. Mapper is used to construct prefix and suffix projected databases for each length-1 frequent item parallel. The Reducer combines all intermediary outcomes to get final sequential patterns. Experiment results are compared against UDDAG, different values of minimum support, different massive data sets and with and without Hadoop platform which improves the scaling and speed performances. Experimental results show that DSPM using Hadoop MapReduce solves the scaling problem as well as storage problem of UDDAG.