Message Passing in Semantic Peer-to-Peer Overlay Networks [Exploratory DSP]

Philippe Cudre-Mauroux, Karl Aberer
2007 IEEE Signal Processing Magazine  
P eer-to-peer (P2P) systems rely on machine-to-machine adhoc communications to offer services to a community. Contrary to the classical client-server architecture, P2P systems consider all peers, i.e., all nodes participating in the network, as being equal. Hence, peers can at the same time act as clients consuming resources from the system, and as servers providing resources to the community. P2P applications function on top of existing routing infrastructures, typically on top of the IP
more » ... k, and organize peers into logical and decentralized structures called overlay networks. In this column, we discuss exploratory research related to data management in P2P overlay networks. First, we discuss the notions of unstructured and structured P2P overlay networks. Then, we discuss data management in such networks by introducing an additional layer to handle semantic heterogeneity and data integration. Finally, we present a method based on sum-product message passing to detect inconsistent information in this setting. P2P OVERLAY NETWORKS-ARCHITECTURE Increasingly on the Internet, applications are supported by sets of loosely connected machines operating without any form of central coordination; Internet telephony networks such as Skype [1] and file sharing applications like Gnutella [2] are two well-known examples of this trend. Contrary to the client-server setting, where applications are bound to sets of static servers identified by an IP address, these applications need ways of organizing the dynamic sets of machines providing the service. P2P overlay networks address this need and allow the management of virtual and decentralized networks created on top of the IP infrastructure. The virtual structure connecting all the peers operating in an overlay network can vary. In unstructured overlay networks such as Gnutella, peers establish connections to a fixed number of other peers, creating a random graph of P2P connections. Requests originating from one peer are forwarded by the other peers in a cooperative manner, as depicted in Figure 1 (a). This relatively simple and robust mechanism is, however, networkintensive, as it broadcasts all queries to all peers within a certain radius irrespective of the content of the query. Structured overlay networks were introduced to alleviate network traffic while maximizing the probability of a query locating a specific peer. Peers in a structured overlay can for example be organized on a multidimensional torus [3] or into a virtual binary search tree, as promulgated by the P-Grid P2P system [4] and illustrated in Figure 1(b) . Such systems provide hash-table functionalities on an Internet-like scale and are known as distributed hash tables (DHTs). They typically enable global search on shared data items in a totally decentralized way in O (log(N )) messages (i.e., packets sent from one peer to another), where N is the number of peers in the overlay. P2P OVERLAY NETWORKS-DATA MANAGEMENT SEMANTIC HETEROGENEITY P2P overlays originally dealt with very simple data and query models: only file names were shared and queries were composed of a single hash value or a keyword. Rapidly, several research efforts [4] tried to enrich overlay networks with more expressive models to support structured data conforming to schemas. In 1053-5888/07/$25.00©2007IEEE [FIG1] Two P2P architectures: (a) an unstructured P2P overlay a la Gnutella: a query originating from a peer on the left-hand side of the figure is iteratively gossiped up to three times, and (b) a structured distributed hash-table a la P-grid: peers are organized into a virtual binary tree.
doi:10.1109/msp.2007.904799 fatcat:igju4x5rdfhtbirwiu3a4jp6di