A new system for massive RDF data management using Big Data query languages Pig, Hive, and Spark

Banane Mouad et. al.
2020 International Journal of Computing and Digital Systems  
The era of big data has emerged. The volume of generated data has never been greater. Massive quantities of data are stored on a huge number of servers that are inter-connected and share their storage space. Computation methods have been developed to perform computation operations directly on these machines, previously used mainly for storage. Tools such as Hive, Pig, and Spark provide the means for data query and analysis but are not suitable for Semantic Data. For this kind of data, a
more » ... zed tool called SPARQL is dedicated to query semantic data represented by the Resource Description Framework or RDF. The aim of our work is to transform a given SPARQL query into a Hive program, a Pig program or a Spark script according to the user's choice. To achieve this goal, we propose a Model-Driven Approach which consists of creating a metamodel for each of these tools, to define a mapping between SPARQL metamodel on one hand and each of the previous Big Data query languages (Pig, Hive, and Spark). The transformation is then performed using Atlas Transformation Language or ATL. We conducted that an experiment on three datasets containing a large volume of distributed RDF data on a powerful server cluster to validate our approach.
doi:10.12785/ijcds/090211 fatcat:en4ndsz26vf65iyikvc47un4nm