The SPARQL2XQuery interoperability framework

Nikos Bikakis, Chrisa Tsinaraki, Ioannis Stavrakantonakis, Nektarios Gioldasis, Stavros Christodoulakis
2014 World wide web (Bussum)  
The Web of Data is an open environment consisting of a great number of large inter-linked RDF datasets from various domains. In this environment, organizations and companies adopt the Linked Data practices utilizing Semantic Web (SW) technologies, in order to publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information exchange in the Web today is XML. Additionally, many international standards (e.g., Dublin Core,
more » ... TS, TEI, IEEE LOM) in several domains (e.g., Digital Libraries, GIS, Multimedia, e-Learning) have been expressed in XML Schema. The aforementioned have led to an increasing emphasis on XML data, accessed using the XQuery query language. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to develop interoperability mechanisms that allow the Web of Data users to access XML datasets, using SPARQL, from their own working environments. It is unrealistic to expect that all the existing legacy data (e.g., Relational, XML, etc.) will be transformed into SW data. Therefore, publishing legacy data as Linked Data and providing SPARQL endpoints over them has become a major research challenge. In this direction, we introduce the SPARQL2XQuery Framework which creates an interoperable environment, where SPARQL queries are automatically translated to XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery Framework provides a mapping model for the expression of OWL-RDF/S to XML Schema mappings as well as a method for SPARQL to XQuery translation. To this end, our Framework supports both manual and automatic mapping specification between ontologies and XML Schemas. In the automatic mapping specification scenario, the SPARQL2XQuery exploits the XS2OWL component which transforms XML Schemas into OWL ontologies. Finally, extensive experiments have been conducted in order to evaluate the schema transformation, mapping generation, query translation and query evaluation efficiency, using both real and synthetic datasets. 3 transforming XML data to RDF data (and vice versa). Moreover, W3C investigates the XSPARQL 8 approach for merging XQuery and SPARQL for transforming XML to RDF data (and vice versa). The recent efforts in bridging the SW and XML worlds focus on data transformation (i.e., XML data to RDF data and vice versa). However, despite the significant body of related work on SPARQL to SQL translation, to the best of our knowledge, there is no work addressing the SPARQL to XQuery translation problem. Given the high importance of XML and the related standards in the Web, this is a major shortcoming in the state of the art. Finally, as far as the Linked Data context is concerned, publishing legacy data and offering SPARQL endpoints over them, has recently become a major research challenge. In spite of the fact that several systems (e.g., D2R Server [106], SparqlMap [107], Quest [108], Virtuoso [109], TopBraid Composer 9 ) offer SPARQL endpoints 10 over relational data, to the best of our knowledge, there is no system supporting XML data. This paper presents SPARQL2XQuery, a framework that provides transparent access over XML in the WoD. Using the SPARQL2XQuery Framework, XML datasets can be turned into SPARQL endpoints. The SPARQL2XQuery Framework provides a method for SPARQL to XQuery translation, with respect to a set of predefined mappings between ontologies 11 and XML Schemas. To this end, our Framework supports both manual and automatic mapping specifications between ontologies and XML Schemas, as well as a schema transformation mechanism. Motivating Example Here, we outline two scenarios in order to illustrate the need for bridging the SW and XML worlds in several circumstances. In our examples, three hypothetically autonomous partners are involved: (a) Digital Library X (which belongs to an institution or a company), (b) Organization A and (c) Organization Z. Each has adopted different technologies to represent and manage their data. Assume that, Digital Library X has adopted XML-related technologies (i.e., XML, XML Schema, and XQuery) and its contents are described in XML syntax, while both organizations have chosen SW technologies (i.e., RDF/S, OWL, and SPARQL). st Scenario. Consider that Digital Library X wants to publish their data in the WoD using SW technologies, a common scenario in the Linked Data era. In this case, a schema transformation and a query translation mechanism are required. Using the schema transformation mechanism, the XML Schema of Digital Library X will be transformed to an ontology. Then, the query translation mechanism will be used to translate the SPARQL queries posed over the generated ontology, to XQuery queries over the XML data. 2 nd Scenario. Consider WoD users and/or applications that express their queries or have implemented their query APIs using the ontologies of Organization A and/or Organization Z. These users and applications should be able to have direct access to Digital Library X from the SW environment, without changing their working environment (e.g., query language, schema, API, etc.). In this scenario, a mapping model and a query translation mechanism are required. In such a case, an expert specifies the mappings between the Organization ontologies and the XML Schema of Digital Library X. These mappings are then exploited by the query translation mechanism, in order to translate the SPARQL queries posed over the Organization ontologies, to XQuery queries to be evaluated over the XML data of Digital Library X. It should be noted that in most realworld situations, an XML Schema may be mapped to more than two ontologies. 8 9 10 Virtual SPARQL endpoints (i.e., with no need to transform the relational data to RDF data). 11 Throughout this paper we use the term ontology as equivalent to a schema definition that has been expressed in RDFS or OWL syntax. Such a schema definition may describe an ontology, i.e., a formal, explicit specification of a shared conceptualization [31]. SPARQL2XQuery
doi:10.1007/s11280-013-0257-x fatcat:cb7hq37tufa33nmfwuqzxkxff4