MAIME: A Maintenance Manager for ETL Processes

Maime Butkevicius, Darius Freiberger, Philipp Daniel, Halberg, Frederik Madsen, Hansen, Jacob Bach, Jensen, Søren Tarp, Michael Huang, Harry Xuegang, Thomsen (+9 others)
2017 Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference   unpublished
The proliferation of business intelligence applications moves most organizations into an era where data becomes an essential part of the success factors. More and more business focus has thus been added to the integration and processing of data in the enterprise environment. Developing and maintaining Extraction-Transform-Load (ETL) processes becomes critical in most data-driven organizations. External Data Sources (EDSs) often change their schema which potentially leaves the ETL processes that
more » ... extract data from those EDSs invalid. Repairing these ETL processes is time-consuming and tedious. As a remedy, we propose MAIME as a tool to (semi-)automatically maintain ETL processes. MAIME works with SQL Server Integration Services (SSIS) and uses a graph model as a layer of abstraction on top of SSIS Data Flow tasks (ETL processes). We introduce a graph alteration algorithm which propagates detected EDS schema changes through the graph. Modifications done to a graph are directly applied to the underlying ETL process. It can be configured how MAIME handles EDS schema changes for different SSIS transformations. For the considered set of transformations, MAIME can maintain SSIS Data Flow tasks (semi-)automatically. Compared to doing this manually, the amount of user inputs is decreased by a factor of 9.5 and the spent time is reduced by a factor of 9.8 in an evaluation.