Join Queries over Data Streams and Their Optimization using Dynamic Metadata

Michael Cammert
2015
Due to technological progress there has been an enormous increase of the number of continuous data streams from which valuable information has to be derived as fast as possible. Therefore, data stream management systems have emerged as a new technology to process continuous queries over data streams. In contrast to databases they primarily operate in memory and are optimized for processing continuous queries over data streams. During the development of this new kind of systems taking over the
more » ... ncept of using relational operator graphs from database theory has proven to be of value if executing them data driven instead of demand driven. Assigning validity intervals to the elements of the data streams solves the problem of processing potentially unbounded data streams while using bounded resources. The capabilities of such systems considerably depend on the availability of efficient and well defined techniques for combining information from different data streams. The objective of this thesis therefore is to transfer the proven concept of the relational join to the data driven data stream processing using validity intervals. For this purpose the semantic of the join operation for data streams is derived from the one of the extended relational algebra using the concept of snapshot-reducibility. Several join algorithms are presented and proven to comply with this semantic. The consequent usage of parameterization of the techniques with respect to the data structures used for storing the status allows supporting a large variety of different join predicates. Well known techniques of join processing using nested loops, hashing or indexing are adapted for data stream processing. Additionally, the Temporal Progressive-Merge-Join is introduced as an algorithm which allows to derive the join by using value based sorting of the data st [...]
doi:10.17192/z2014.0500 fatcat:vbqzllg2xvhanhtjqadjsm2gpm