T-Cube: A Data Structure for Fast Extraction of Time Series from Large Datasets [report]

Maheshkumar Sabhnani, Andrew W. Moore, Artur W. Dubrawski
2007 unpublished
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information
more » ... perations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE ad-hoc time series queries against large datasets. We have tested T-Cube on both synthetic and real world data (emergency room patient visits, pharmacy sales) containing millions of records. The results indicate that T-Cube responds to complex queries 1,000 times faster when compared to the state-of-the-art commercial time series extraction tools. This speedup has two main benefits: (1) It enables massive scale statistical mining of large collections of time series data, and (2) It allows its users to perform many complex ad-hoc queries without inconvenient delays. These benefits have been already found useful in applications related to practice of monitoring safety of food and agriculture, in detection of emerging patterns of failures in maintenance and supply management systems, as well as in the original application domain: bio-surveillance. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 21 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified Standard Form 298 (Rev. 8-98) Abstract This report introduces a data structure called T-Cube designed to dramatically improve response time to ad-hoc time series queries against large datasets. We have tested T-Cube on both synthetic and real world data (emergency room patient visits, pharmacy sales) containing millions of records. The results indicate that T-Cube responds to complex queries 1,000 times faster when compared to the state-of-the-art commercial time series extraction tools. This speedup has two main benefits: (1) It enables massive scale statistical mining of large collections of time series data, and (2) It allows its users to perform many complex ad-hoc queries without inconvenient delays. These benefits have been already found useful in applications related to practice of monitoring safety of food and agriculture, in detection of emerging patterns of failures in maintenance and supply management systems, as well as in the original application domain: bio-surveillance.
doi:10.21236/ada471457 fatcat:kpxvv5xcavby5ftnakhuuyhv2y