Benchmarking Distributed Stream Processing Engines [article]

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 arXiv   pre-print
Over the last years, stream data processing has been gaining attention both in industry and in academia due to its wide range of applications. To fulfill the need for scalable and efficient stream analytics, numerous open source stream data processing systems (SDPSs) have been developed, with high throughput and low latency being their key performance targets. In this paper, we propose a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink.
more » ... r evaluation focuses in particular on measuring the throughput and latency of windowed operations. For this benchmark, we design workloads based on real-life, industrial use-cases. The main contribution of this work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we completely separate the system under test and driver, so that the measurement results are closer to actual system performance under real conditions. Third, we build the first driver to test the actual sustainable performance of a system under test. Our detailed evaluation highlights that there is no single winner, but rather, each system excels in individual use-cases.
arXiv:1802.08496v1 fatcat:zfdxcy3eajcv7eqafymethtvmq