BSI: Bloom Filter-Based Semantic Indexing for Unstructured P2P Networksh
International Journal of Peer to Peer Networks
Resource management and search is very important yet challenging in large-scale distributed systems like P2Pnetworks. Most existing P2P systems rely on indexing to efficiently route queries over the network. However, searches based on such indices face two key issues. First, majority of existing search schemes often rely on simply keyword based indices that can only support exact string based matches without taking into account the meaning of words. Second it is difficult, if not impossible, to
... devise query based indexing schemes that can represent all possible concept combinations without resulting in exponential index sizes. To address these problems, we present BSI, a novel P2P indexing and query routing strategy to support semantic based content searches. The BSI indexing structure captures the semantic content of documents using a reference ontology. Our indexing scheme can efficiently handle multi-concept queries by maintaining summary level information for each individual concept and concept combinations using a novel space-efficient Two-level Semantic Bloom Filter(TSBF) data structure. By using TSBFs to represent a large document and query base, BSI significantly reduces the communication cost and storage cost of indices. Furthermore, We devise a low-overhead mechanism to allow peers to dynamically estimate the relevance strength of a peer for multi-concept queries with high accuracy solely based on TSBFs. We also propose a routing index compression mechanism to observe peers' dynamic storage limitations with minimal loss of information by exploiting a reference ontology structure. Based on the proposed index structure, we design a novel query routing algorithm that exploits semantic based information to route queries to semantically relevant peers. Performance evaluation demonstrates that our proposed approach can improve the search recall of unstructured P2P systems up to 383.71% while keeping the communication cost at a low level compared to state-of-art search mechanism OSQR  .