Model-driven deployment and management of workflows on analytics frameworks

Merlijn Sebrechts, Sander Borny, Thomas Vanhove, Gregory Van Seghbroeck, Tim Wauters, Bruno Volckaert, Filip De Turck
2016 2016 IEEE International Conference on Big Data (Big Data)  
The data science skills shortage means that those who have the knowledge are under constant pressure to do more with less. While the data science tools are improving at a staggering pace, the operational tools around them can not keep up. Even researchers at Google state that the issue of automatic configuration and dependency management of services is still an "open, hard problem". This manifests itself in data scientists either constantly having to solve operational challenges or having to be
more » ... in constant close collaboration with a skilled operations team. This paper addresses the operational challenges behind deploying and managing workflows on top of analytics platforms by starting from three key requirements: data scientists want to model their workflows in a reusable way, this model should be automatically deployed, managed and connected to other services, and this solution should be compatible with existing cloud modeling languages, infrastructure, analytics platforms and tools. The paper explores where the state-of-the-art falls short in meeting these requirements, proposes an architecture to solve the open challenges, and implements and evaluates this architecture.
doi:10.1109/bigdata.2016.7840930 dblp:conf/bigdataconf/SebrechtsBVSWVT16 fatcat:jv3en4igxzdhlmy3w35yhygm54