A Proposal of Scala Script Generation Tool for Extract Transform Load (ETL) Operations

Bhakti Deshpande
2020 International Journal for Research in Applied Science and Engineering Technology  
ETL is a process for data integration to provide a consolidated view of data that involves three steps such as Extract, Transform and Load to combine data from multiple sources. In this process, data is fetched from numerous data sources, transformed into a particular format and finally loaded into a suitable data warehouse. Developers have to write the Scala code using Spark SQL to perform ETL operations. But as and when the requirement changes, the developers have to write code again
more » ... code again accordingly. This paper presents an idea to build a tool to facilitate ETL process using Spark. The tool will automatically generate Scala scripts from the uploaded ETL mapping document. In order to verify the scripts, the system also provides unit test cases and SQL queries using Spark SQL. This will minimize the repetitive tasks faced by developers and provide a robust system which will ease the development of ETL operations.
doi:10.22214/ijraset.2020.7046 fatcat:hqjfgsemtrgz7gfm3jelfhjxqe