Data Stream Query Processing [chapter]

Nick Koudas, Divesh Srivastava
2003 Proceedings 2003 VLDB Conference  
Measuring and monitoring complex, dynamic phenomena -traffic evolution in internet and telephone communication infrastructures, usage of the web, email and newsgroups, movement of financial markets, atmospheric conditionsproduces highly detailed stream data, i.e., data that arrives as a series of "observations", often very rapidly. With traditional data feeds, one modifies and augments underlying databases and data warehouses: complex queries over the data are performed in an offline fashion,
more » ... d real time queries are typically restricted to simple filters. However, the monitoring applications that operate on modern data streams require sophisticated real time queries (often in an exploratory mode) to identify, e.g., unusual/anomalous activity (such as network intrusion detection or telecom fraud detection), based on intricate relationships between the values of the underlying data streams. Stream data are also generated naturally by (messagebased) web services, in which loosely coupled systems interact by exchanging high volumes of business data (e.g., purchase orders, retail transactions) tagged in XML (the lingua franca of web services), forming continuous XML data streams. A central aspect of web services is the ability to efficiently operate on these XML data streams executing queries (expressed in some XML query language) to continuously match, extract and transform parts of the XML data stream to drive legacy back-end business applications. Manipulating stream data presents many technical challenges which are just beginning to be addressed in the database, systems, algorithms, networking and other computer science communities. This is an active research area in the database community, involving new stream operators, SQL extensions, query optimization methods, operator scheduling techniques, etc., with the goal of developing general-purpose (e.g., NiagaraCQ, Stanford Stream, Telegraph, Aurora) and specialized (e.g., Gigascope) data stream management systems. The objective of this tutorial is to provide a comprehensive and cohesive overview of the key research results in the area of data stream query processing, both for SQL-like and XML query languages. Tutorial Outline The tutorial is example driven, and organized as follows. Applications, Query Processing Architectures: Data stream applications, data and query characteristics, query processing architectures of commercial and prototype systems. Stream SQL Query Processing: Filters, simple and complex joins, aggregation, SQL extensions, approximate answers, query optimization methods, operator scheduling techniques. Stream XML Query Processing: Automata-and navigation-based techniques for single and multiple XPath queries, connections with stream SQL query processing.
doi:10.1016/b978-012722442-8/50128-2 dblp:conf/vldb/KoudasS03 fatcat:2fc26gzoura3tff74v33eylvcy