Supporting Generic Cost Models for Wide-Area Stream Processing

Olga Papaemmanouil, Ugur Çetintemel, John Jannotti
2009 Proceedings / International Conference on Data Engineering  
Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, a generic data stream collection, processing, and dissemination system that addresses this limitation efficiently. XFlow can express and optimize a variety of optimization metrics and constraints by distributing stream processing queries across a wide-area network. It uses metric-independent decentralized algorithms that
more » ... work on localized, aggregated statistics, while avoiding local optima. To facilitate light-weight dynamic changes on the query deployment, XFlow relies on a loosely-coupled, flexible architecture consisting of multiple publish-subscribe overlay trees that can gracefully scale and adapt to changes to network and workload conditions. Based on the desired performance goals, the system progressively refines the query deployment, the structure of the overlay trees, as well as the statistics collection process. We provide an overview of XFlow's architecture and discuss its decentralized optimization model. We demonstrate its flexibility and the effectiveness using real-world streams and experimental results obtained from XFlow's deployment on PlanetLab. The experiments reveal that XFlow can effectively optimize various performance metrics in the presence of varying network and workload conditions. op latency= sum(latency, UP OPERATORS) + processing latency query latency= sum(op latency, OPERATORS) system cost= max(query latency, QUERIES)
doi:10.1109/icde.2009.11 dblp:conf/icde/PapaemmanouilCJ09 fatcat:qv6xgekp2fdfrawg3nazgsyswu