Feeding PIDza to VIVO: data ingest with SPARQL-Generate

Maxime Lefrançois, Sandra Mierz
<span title="2021-06-23">2021</span> <i title="Zenodo"> Zenodo </i> &nbsp;
The first hurdle after installing VIVO is to fill it with an initial set of data about an institution, its researchers and their publications. Done manually it is a cumbersome and time-consuming process. One approach to overcome this is to use open-data containing a persistent identifier(PID) like ROR, ORCID or DOI. The advantage lies in the reduced processing of input data: since data does not need to be disambiguated, the data ingestion process can be reduced to mapping the data to the VIVO
more &raquo; ... tology. While several tools exist that are able to import one PID-identified object into VIVO, the release of Datacite Commons takes this approach to the next level. Datacite Commons offers an interface to a so-called PID-Graph: a structure of multiple connected data objects each identified by a PID. It makes queries possible that take advantage of the connections between several PIDs like e.g. querying an organization (identified by a ROR iD) and its affiliated persons (identified by their ORCID iD) and subsequently their publications (identified by a DOI), and thus providing a quick data basis for an empty Research Information System. In this talk, we will present a microservice importing data from the Datacite Commons PID-Graph and the ROR API into VIVO ( https://github.com/vivo-community/generate2vivo ). This microservice is based on lifting rules defined using the SPARQL-Generate RDF transformation language, which we will overview beforehand. SPARQL-Generate is an expressive template-based language to generate RDF streams or text streams from RDF datasets and document streams in arbitrary formats (for more information see website https://w3id.org/sparql-generate/ )
