Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins

Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk, Eamonn Keogh
2016 2016 IEEE 16th International Conference on Data Mining (ICDM)  
Time series motifs have been in the literature for about fifteen years, but have only recently begun to receive significant attention in the research community. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, etc. Recent work has improved the scalability to the point where exact motifs can be computed on datasets with up to a million data
more » ... points in tenable time. However, in some domains, for example seismology, there is an insatiable need to address even larger datasets. In this work we show that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred million subsequences, by far the largest dataset ever mined for time series motifs. Furthermore, we demonstrate that our algorithm can produce actionable insights in seismology and other domains.
doi:10.1109/icdm.2016.0085 dblp:conf/icdm/ZhuZSYFMBK16 fatcat:am6llbgs4nghnlsi3s3jsyp344