Privacy-Preserving Schema Matching Using Mutual Information [chapter]

Isabel F. Cruz, Roberto Tamassia, Danfeng Yao
2007 Lecture Notes in Computer Science  
The problem of schema or ontology matching is to define mappings among schema or ontology elements. Such mappings are typically defined between two schemas or two ontologies at a time. Ideally, using the defined mappings, one would be able to issue a single query that will be rewritten automatically to all the databases, instead of manually writing a query to each database. In a centrally mediated architecture a query is written in terms of a global schema or ontology that integrates all the
more » ... abase schemas or ontologies, while in a peer-to-peer architecture a query is written in terms of the schema or of the ontology of any of the peer databases. Automatic schema matching approaches can use only the schema, only the instances, or a combination of both. Mappings can take into account not only concept properties (e.g., string similarity), but also constraints (e.g., relationship cardinality) and schema structure (e.g., graph similarity) [9] . Security and privacy issues arise in the context of data integration. For example, previous work looks into secure access to mediated data [2, 4] . Other work has defined the concept of minimal necessary information sharing that applies to querying: in computing the answer to a query, only the query result should be revealed [1] . Most matching approaches rely on the fact that both schemas or ontologies are completely visible by both parties. Clearly, this approach disregards security and privacy considerations. Even within the same organization, different users have access to different database views. It is, therefore, only natural to create automatic mechanisms by which mappings can be established between a pair of schemas or ontologies, without each party needing to reveal their whole metadata. Clifton et al. discuss issues and identify research directions in privacy-preserving data integration, including those that arise in schema matching [3] . More recently, Mitra et al. look at the specific issue of privacy-preserving ontology matching [7, 8] . In their approach, terms in the ontologies and in the matching rules (which define the mappings) are encrypted, so that the mediator does not see the actual terms. However, during the ontology matching process, which is semi-automatic, a human expert has access to both ontologies in cleartext (using a session key). We propose an automatic privacy-preserving schema matching protocol. The result of this protocol is the set of mappings between attributes in the schemas of the two
doi:10.1007/978-3-540-73538-0_7 fatcat:nryt4tirfbc3vcih5emwhflg7m