Efficient peer-to-peer semantic overlay networks based on statistical language models

Alessandro Linari, Gerhard Weikum
2006 Proceedings of the international workshop on Information retrieval in peer-to-peer networks - P2PIR '06  
In this paper we address the query routing problem in peerto-peer (P2P) information retrieval. Our system builds up on the idea of a Semantic Overlay Network (SON), in which each peer becomes neighbor of a small number of peers, chosen among those that are most similar to it. Peers in the network are represented by a statistical Language Model derived from their local data collections but, instead of using the non-metric Kullback-Leibler divergence to compute the similarity between them, we use
more » ... a symmetrized and "metricized" related measure, the square root of the Jensen-Shannon divergence, which let us map the problem to a metric search problem. The search strategy exploits the triangular inequality to efficiently prune the search space and relies on a priority queue to visit the most promising peers first. To keep communications costs low and to perform an efficient comparison between Language Models, we devise a compression technique that builds on Bloom-filters and histograms and we provide error bounds for the approximation and a cost analysis for the algorithms used to build and maintain the SON.
doi:10.1145/1183579.1183582 fatcat:iu3thxomxrhf3fwxf3hqwttloi