A Proposal of Scala Script Generation Tool for Extract Transform Load (ETL) Operations
International Journal for Research in Applied Science and Engineering Technology
ETL is a process for data integration to provide a consolidated view of data that involves three steps such as Extract, Transform and Load to combine data from multiple sources. In this process, data is fetched from numerous data sources, transformed into a particular format and finally loaded into a suitable data warehouse. Developers have to write the Scala code using Spark SQL to perform ETL operations. But as and when the requirement changes, the developers have to write code again
... code again accordingly. This paper presents an idea to build a tool to facilitate ETL process using Spark. The tool will automatically generate Scala scripts from the uploaded ETL mapping document. In order to verify the scripts, the system also provides unit test cases and SQL queries using Spark SQL. This will minimize the repetitive tasks faced by developers and provide a robust system which will ease the development of ETL operations.