Subsequence matching on structured time series data

Huanmei Wu, Betty Salzberg, Gregory C Sharp, Steve B Jiang, Hiroki Shirato, David Kaeli
2005 Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05  
Subsequence matching in time series databases is a useful technique, with applications in pattern matching, prediction, and rule discovery. Internal structure within the time series data can be used to improve these tasks, and provide important insight into the problem domain. This paper introduces our research effort in using the internal structure of a time series directly in the matching process. This idea is applied to the problem domain of respiratory motion data in cancer radiation
more » ... er radiation treatment. We propose a comprehensive solution for analysis, clustering, and online prediction of respiratory motion using subsequence similarity matching. In this system, a motion signal is captured in real time as a data stream, and is analyzed immediately for treatment and also saved in a database for future study. A piecewise linear representation of the signal is generated from a finite state model, and is used as a query for subsequence matching. To ensure that the query subsequence is representative, we introduce the concept of subsequence stability, which can be used to dynamically adjust the query subsequence length. To satisfy the special needs of similarity matching over breathing patterns, a new subsequence similarity measure is introduced. This new measure uses a weighted ¢ ¡ distance function to capture the relative importance of each source stream, amplitude, frequency, and proximity in time. From the subsequence similarity measure, stream and patient similarity can be defined, which are then used for offline and online applications. The matching results are analyzed and applied for motion prediction and correlation discovery. While our system has been customized for use in radiation therapy, our approach to time series modeling is general enough for application domains with structured time series data. Although recently there are research efforts in characterization and prediction [3, 20, 24, 26] , the best parameterization for respiratory
doi:10.1145/1066157.1066235 dblp:conf/sigmod/WuSSJSK05 fatcat:74gxldc64zbqrlg2cu44zsjrzu