Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks [chapter]

Peter Triantafillou, Theoni Pitoura
2004 Lecture Notes in Computer Science  
In this work we study how to process complex queries in DHT-based Peer-to-Peer (P2P) data networks. Queries are made over tuples and relations and are expressed in a query language, such as SQL. We describe existing research approaches for query processing in P2P systems, we suggest improvements and enhancements, and propose a unifying framework that consists of a modified DHT architecture, data placement and search algorithms, and provides efficient support for processing a variety of query
more » ... es, including queries with one or more attributes, queries with selection operators (involving equality and range queries), and queries with join operators. To our knowledge, this is the first work that puts forth a framework providing support for all these query types. Recently, P2P architectures that are based on Distributed Hash Tables (DHTs) have been proposed and have since become very popular, influencing research in Peer-to-Peer (P2P) systems significantly. DHT-based systems provide efficient processing of the routing/location operations that, given a query for a document id, they locate (route the query to) the peer node that stores this document. Thus, they provide support for exact-match queries. To do so, they rely, in general, on lookups of a distributed hash table, which creates a structure in the system emerging by the way that peers define their neighbors. For this reason, they are referred to as structured P2P systems, as opposed to systems like Gnutella[1], MojoNation[2], etc, where there is no such structure and, instead, neighbors of peers are defined in rather ad hoc ways. There are several P2P DHTs architectures (Chord[3], CAN[4], Pastry[5], Tapestry[6], etc.). From these, CAN and Chord are the most commonly used as a substrate upon which to develop higher layers supporting more elaborate queries. CAN ([4]) uses a d-dimensional virtual address space for data location and routing. Each peer in the system owns a zone of the virtual space and stores the data objects that are mapped into its zone. Each peer stores routing information about O(d) other peers, which is independent of the number of peers, N, in the system. Each data object is mapped to a point in d-dimensional space and then the request is routed towards the mapped point in the virtual space. Each peer on the path passes the
doi:10.1007/978-3-540-24629-9_13 fatcat:zichaif3lzfsrirnehgyrkg4di