Using semantic links to support top-K join queries in peer-to-peer networks

Jie Liu, Liang Feng, Hai Zhuge
2007 Concurrency and Computation  
An important issue raised in peer-to-peer (P2P) applications is how to accurately and efficiently retrieve a set of K best matching data objects from different sources while minimizing the number of objects that have to be accessed. The proposed solution is to organize peers by a semantic link network representing the semantic relationships between peers' data schemas. Queries are only routed to semantically relevant peers. A pruning-based local top-K ranking approach is proposed to reduce the
more » ... ransmitted data by pruning tuples that cannot produce the desired join results with a rank value at least equal to the lowest rank value generated. Experiments evaluate its performance in terms of the number of transmitted tuples and the miss rate. Comparison with the traditional threshold algorithm for centralized systems and other top-K ranking algorithms for P2P networks shows the features of the proposed approach. PDMSs can be classified as unstructured or structured according to the topology of the underlying P2P networks. Unstructured PDMSs use peer clustering and peer indexing approaches rather than flooding or random selection for query routing. Structured PDMSs use a distributed hash table (DHT) for query routing, which can find information within a bounded number of hops in large-scale P2P networks but it does not support complex queries. PDMSs provide a scalable basis for building Semantic Web applications. However, existing P2P systems do not have the data management capabilities that are typically found in relational databases [4] [5] [6] [7] . An important issue is how to efficiently return the top-K results rather than all the satisfactory answers from multiple data sources with the minimum transmission cost. Usually K is quite small compared to the total number of satisfactory answers. USING SEMANTIC LINKS TO SUPPORT TOP-K QUERIES IN P2P NETWORKS tuples and the top-K miss rate. The proposed approach can be used in the P2P-based Knowledge Grid [34] [35] [36] to support advanced applications in P2P knowledge management. Our work will continue in three main areas. First, we plan to incorporate query optimization techniques, such as bloom filters and ripple join algorithms, with the proposed top-K join approach to reduce the transmission cost, the response time, and the miss rate. Secondly, we plan to apply the proposed approach to answering top-K join queries in XML. Finally, we plan to look for a good method of answering top-K join queries in structured P2P networks.
doi:10.1002/cpe.1145 fatcat:c2vnwnrjlban7fq3vfi6uhgcz4