Fast and Reliable Stream Processing over Wide Area Networks

Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik
2007 2007 IEEE 23rd International Conference on Data Engineering Workshop  
We present a replication-based approach that enables both fast and reliable stream processing over wide area networks. Our approach replicates stream processing operators in a manner where operator replicas compete with each other to make the earliest impact. Therefore, any processing downstream from such replicas can proceed by relying on the fastest replica without being held back by slow or failed ones. Furthermore, our approach allows replicas to produce output in different orders so as to
more » ... void the cost of forcing an identical execution across replicas, without sacrificing correctness. We first consider semantic issues for correct replicated stream processing and, based on a formal foundation, extend common stream-processing primitives. Next, we discuss strategies for deploying replicas. Finally, we present preliminary results obtained from experiments on Planet-Lab that substantiate the potential benefits of our approach. (a) Non-replicated Stream Processing: In this example, tuple (9:00:00, A-C) arrives at 1 on time through stream S 1 . However, 1 cannot immediately process that tuple because its matching tuple (9:00:00, C, 50%) arrives late as it was delayed on the way through stream S 3 . (b) Replicated Stream Processing (Processing after stream S 5 is omitted) : Replicating the input flows of 1 (see that data flows S 1 , S 2 −S 4 , S 3 −S 4 and operator U are added) makes 1 run in a more timely fashion. Replicated input flows however introduce duplicate tuples. In order to produce correct results, 1 has to filter them out (see those stroked-through).
doi:10.1109/icdew.2007.4401047 dblp:conf/icde/HwangCZ07 fatcat:pya7b5sxbbgfjbmdird67phzc4