Learning Commonalities in RDF [chapter]

Sara El Hassad, François Goasdoué, Hélène Jaudoin
2017 Lecture Notes in Computer Science  
Finding the commonalities between descriptions of data or knowledge is a foundational reasoning problem of Machine Learning introduced in the 70's, which amounts to computing a least general generalization (lgg) of such descriptions. It has also started receiving consideration in Knowlegge Representation from the 90's, and recently in the Semantic Web field. We revisit this problem in the popular Resource Description Framework (RDF) of W3C, where descriptions are RDF graphs, i.e., a mix of data
more » ... and knowledge. Notably, and in contrast to the literature, our solution to this problem holds for the entire RDF standard, i.e., we do not restrict RDF graphs in any way (neither their structure nor their semantics based on RDF entailment, i.e., inference) and, further, our algorithms can compute lggs of small-to-huge RDF graphs. The Resource Description Framework (RDF) RDF graphs. The RDF data model allows specifying RDF graphs. An RDF graph is a set of triples of the form (s, p, o). A triple states that its subject s has the property p, the value of which is the object o. Triples are built using three pairwise disjoint sets: a set U of uniform resources identifiers (URIs), a set L of literals (constants), and a set B of blank nodes allowing to support incomplete information. Blank nodes are identifiers for missing values in an RDF graph (unknown URIs or literals). Well-formed triples, as per the RDF specification [31], belong to (U ∪ B) × U × (U ∪ L ∪ B); we only consider such triples hereafter. Notations. We use s, p, o in triples as placeholders. We note Val(G) the set of values occurring in an RDF graph G, i.e., the URIs, literals and blank nodes; we note Bl(G) the set of blank nodes occurring in G. A blank node is written b possibly with a subscript, and a literal is a string between quotes. For instance, the triples (b, hasTitle, "LGG in RDF") and (b, hasContactAuthor, b 1 ) state that something (b) entitled "LGG in RDF" has somebody (b 1 ) as contact author. A triple models an assertion, either for a class (unary relation) or for a property (binary relation). Table 1 (top) shows the use of triples to state such assertions. The RDF standard [31] provides built-in classes and properties, as URIs within the rdf and rdfs pre-defined namespaces, e.g., rdf:type which can be used to state that the above b is a conference paper with the triple (b, rdf:type, ConfPaper). Adding ontological knowledge to RDF graphs. An essential feature of RDF is the possibility to enhance the descriptions in RDF graphs by declaring ontological constraints between the classes and properties they use. This is achieved with RDF Schema (RDFS) statements, which are triples using particular buit-in properties. Table 1 (bottom) lists the allowed constraints and the triples to state them; domain and range denote respectively the first and second attribute of every property. For example, the triple (ConfPaper, rdfs:subClassOf, Publication) states that conference papers are publications, the triple (hasContactAuthor, rdfs:subPropertyOf, hasAuthor) states that having a contact author is having an author, the triple (hasAuthor, rdfs:domain, Publication)
doi:10.1007/978-3-319-58068-5_31 fatcat:gftpictqnrfs5ciwzbbthderpy