Instance-based Schema Matching for Web Databases by Domain-specific Query Probing [chapter]

J WANG, J WEN, F LOCHOVSKY, W MA
2004 Proceedings 2004 VLDB Conference  
In a Web database that dynamically provides information in response to user queries, there are two distinguishing schemas, interface schema and result schema, presented to users. Each of them partially reflect schema of the backend database. Most previous works merely studied the problem of schema matching across query interfaces of Web databases. In this paper, we propose a novel schema model that, in particular, distinguishes the interface schema (the schema users can query) and the result
more » ... ema (the schema users can browse) of a Web database in a specific domain. In this model, we address two significant schema matching problems for Web databases, intra-site schema matching and inter-site schema matching. The first problem is crucial in automatically extracting data from Web databases, while the second problem plays a significant role in meta-retrieving and integrating data from different Web databases. We also investigate the feasibility of a unified solution to the two problems based on query probing and instance-based schema matching techniques. Benefiting form the model, a cross validation technique is also proposed to improve the accuracy of various schema matchings. Our experiments on real Web databases demonstrate that the two problems can be solved at the same time with high precision and recall. Previous approaches ([16], [17], [21]) to matching the schemas of Web databases primarily focus on matching query interfaces (i.e., on inter-site interface schema 1 Attribute matching is the process of determining the semantic correspondences among the attributes of two schemas.
doi:10.1016/b978-012088469-8/50038-3 fatcat:hoaxltzqljg6jeowvsd4jf4iwm