Engineering Scalable Distributed Services for Real-Time Big Data Analytics

Sahar Jambi, Kenneth M. Anderson
2017 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService)  
There is high demand for techniques and tools to process and analyze large sets of streaming data in both industrial and academic settings. While existing work in this area has focused on a wide range of issues including persistence technologies, advanced analysis tools, functional web interfaces, and the like, I focus on query support. In particular, I focus on providing analysts flexibility with respect to the types of queries they can make on large data sets, in real time as well as over
more » ... orical data. I am building a lightweight service-based framework-EPIC Real-Timethat manages a set of queries that can be applied to user-initiated data analysis events (such as studying tweets generated during a disaster). My prototype combines stream processing and batch processing techniques inspired by the approach embodied in the Lambda Architecture. I investigate a core set of query types that can answer the wide range of queries asked by analysts who study crisis events. For this research, I design and develop a flexible set of real-time analytical tools that will allow analysts to ask new types of questions as they move their research activity from after a crisis to analysis during an event. This will enable them to monitor online social behaviors and capture interesting interactions in real-time across the various phases of a disaster. In this dissertation, I present a prototype implementation of EPIC Real-Time which makes use of message-driven and reactive programming techniques. I also present a performance evaluation on how efficiently the real-time and batch-oriented queries perform, how well these queries meet the needs of Project EPIC analysts, and provide insight into how EPIC Real-Time performs along a number of non-functional requirements important for big data, such as performance, usability, scalability, and reliability. iv ACKNOWLEDGEMENTS First, I would like to thank my advisor, Professor Ken Anderson for his valuable guidance and direction. This thesis would not have been possible without his big support, kindness, and patience.
doi:10.1109/bigdataservice.2017.22 dblp:conf/bigdataservice/JambiA17 fatcat:ffx657klkbbqvnv7v6yztb6tfi