Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

Hamid Nasiri, Saeed Nasehi, Maziar Goudarzi
2019 Journal of Big Data  
Introduction Large number of embedded devices, massive volumes of data, users and applications are driving the digital world to move faster than ever. To be competitive in today's digital economy companies have to process large volumes of dynamically changing data at real-time. There are many industries from health-care, e-commerce, insurance and telecommunications with various use cases such as DNA sequencing, capturing customer insights, real-time offers, high-frequency trading, and real-time
more » ... intrusion detection that have taken the use of Big Data analytics into account to make critical decisions that impact their business [1] . On the other hand, the Internet of Things (IoT) is becoming the primary grounds for data mining and Big Data analytics [2] . With the rapid growth of IoT and its use cases in different domains such as Smart City, Mobile e-Health and Smart Grid, streaming applications are driving a new wave of data revolutions. In most IoT applications the resulting analytics give some feedbacks to the system to improve it [3] . Compared to the other Big Data domains, there is a low-latency cycle between system Abstract The widespread growth of Big Data and the evolution of Internet of Things (IoT) technologies enable cities to obtain valuable intelligence from a large amount of real-time produced data. In a Smart City, various IoT devices generate streams of data continuously which need to be analyzed within a short period of time; using some Big Data technique. Distributed stream processing frameworks (DSPFs) have the capacity to handle real-time data processing for Smart Cities. In this paper, we examine the applicability of employing distributed stream processing frameworks at the data processing layer of Smart City and appraising the current state of their adoption and maturity among the IoT applications. Our experiments focus on evaluating the performance of three DSPFs, namely Apache Storm, Apache Spark Streaming, and Apache Flink. According to our obtained results, choosing a proper framework at the data analytics layer of a Smart City requires enough knowledge about the characteristics of target applications. Finally, we conclude each of the frameworks studied here have their advantages and disadvantages. Our experiments show Storm and Flink have very similar performance, and Spark Streaming, has much higher latency, while it provides higher throughput. which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
doi:10.1186/s40537-019-0215-2 fatcat:ijkn7pbgybfz5d2md2xgkmsh2a