Kafka-ML: Connecting the data stream with ML/AI frameworks

Cristian Martín, Peter Langendoerfer, Pouya Soltani Zarrin, Manuel Díaz, Bartolomé Rubio
2021 Future generations computer systems  
Machine Learning (ML) and Artificial Intelligence (AI) depend on data sources to train, improve, and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we propose Kafka-ML, a novel and open-source framework that enables the management of ML/AI
more » ... ipelines through data streams. Kafka-ML provides an accessible and user-friendly Web user interface where users can easily define ML models, to then train, evaluate, and deploy them for inferences. Kafka-ML itself and the components it deploys are fully managed through containerization technologies, which ensure their portability, easy distribution, and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems. J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof Approach has this feature Approach has not this feature ¦ Approach has partially this feature * Information not available 6 J o u r n a l P r e -p r o o f Journal Pre-proof The average throughput of this scenario is shown in Figure 13. In this case, the higher throughput is obtained with 8-16 replicas and 8 clients. This may be due to the overload of clients that Kafka-ML can handle with a single 15 J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof 17 https://github.com/uTensor/uTensor 18 https://www.tensorflow.org/lite 20 J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof Figure 9: Training management and visualization in Kafka-ML 26 J o u r n a l P r e -p r o o f Journal Pre-proof Bartolomé Rubio received his MS and PhD degree in Computer Engineering from the University of Málaga in 1990 and 1998, respectively. From 1991 to 2000 he was an Assistant Professor at the Department of Languages and Computer Science of the University of Málaga. Since 2001 he has been an Associate Professor in the same department. He has been working in the areas of distributed and parallel programming J o u r n a l P r e -p r o o f Journal Pre-proof and coordination models and languages. Currently, he is specially involved in the research fields of Wireless Sensor and Actor Networks and the Integration of Internet of Things and Cloud Computing. He has been a member of the Software Engineering group of the University of Málaga (GISUM) since its foundation and recently is a member of the ITIS software Institute of the University of Málaga. J o u r n a l P r e -p r o o f Journal Pre-proof CRediT author statement Cristian Martín: Software development and conceptualization. First manuscript draft. Peter Langendoefer, Manuel Díaz, and Bartolomé Rubio: Supervised the research, conceptualization, manuscript review, and funding. Pouya Soltani Zarrin: Helped in integrating with the Exasens dataset, use case redaction, and manuscript review. J o u r n a l P r e -p r o o f Journal Pre-proof Bartolome Manuel Cristian Peter 1 J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof
doi:10.1016/j.future.2021.07.037 fatcat:gfwq5qo4frabhjhqen3ayugoni