Full-text federated search in peer-to-peer networks

Jie Lu
2007 SIGIR Forum  
Peer-to-peer (P2P) networks integrate autonomous computing resources without requiring a central coordinating authority, which makes them a potentially robust and scalable model for providing federated search capability to large-scale networks of text digital libraries. However, P2P networks have so far mostly used simple search techniques based on document names or controlledvocabulary terms, and provided very limited support for full-text search of document contents. This dissertation
more » ... solutions to full-text federated search with relevance-based document ranking within an integrated framework of P2P network overlay, search, and evolution models. Previous notions of P2P network architectures are extended to define a network overlay model with desired content distribution and navigability. Existing approaches to federated search are adapted, and new methods are developed for resource representation, resource selection, and result merging in a network search model according to the unique characteristics of P2P networks. Furthermore, autonomous and decentralized algorithms to evolve the network topology into one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient full-text federated search in dynamic environments. To demonstrate that the proposed solutions are both effective and practical, two P2P testbeds consisting of thousands of real-content text digital libraries and hundreds of thousands of automatically generated queries are developed. Evaluation using these testbeds provides strong empirical evidence that the approaches proposed in this dissertation provide a better combination of accuracy, efficiency and robustness than more common alternatives. the environment. Therefore, federated search in P2P networks requires new solutions to extend existing techniques designed for environments with a global control in order to address the problem of how multiple distributed resources work autonomously and collaboratively to accomplish the retrieval task. In addition to the decentralized nature of P2P networks, another characteristic that distinguishes P2P networks from traditional search environments is their dynamic nature. When peers in a network are permitted to arrive and depart at will, the structure of the network is under constant change, which affects how contents are distributed in the network and how easy it is to navigate from a source peer to a target peer using peer connections. Because P2P networks are decentralized, peers must rely on dynamic self-organization to adjust network structures. New approaches are needed to guide peer organization to achieve desired content distribution and network navigability. Contributions We extend previous notions of P2P networks to define a P2P network overlay model with enhanced functionalities in network architecture, and desired content distribution and navigability in network topology. Based on the network architecture extended to support full-text federated search, we develop a network search model to conduct effective and efficient federated search of text digital libraries. A network evolution model is also proposed to describe how a P2P network can dynamically and autonomously evolve into one with the defined network topology to further improve search performance. Our network overlay model, network search model, and network evolution model provide an integrated framework for full-text federated search of text digital libraries that provides accurate, efficient, robust, and scalable search. The network overlay model uses hubs (directory services) to define the upper level or backbone of the network and leaves (digital libraries and users) to define the lower level of the network in a twolevel hierarchy. Different functionalities of peers lead to different types and properties of connections between them. At the upper level in the hierarchy, the network has locational proximity of similar content areas and short global separation of dissimilar content areas for good navigability. At the lower level in the hierarchy, connections between digital libraries and hubs are organized to form cohesive content-based clusters for desired content distribution. In addition, connections between users and hubs are established based on users' interests. The key contributions of our network overlay model are i) its explicit recognition of distinctive structural requirements for peers with different functionalities, and ii) its effective integration of several network properties that can enhance search performance in a single architecture, both of which play critical roles in the effort to optimize the overall federated search performance of the network. The network search model utilizes the network architecture and topology defined in the network overlay model in designing a full-text search mechanism that can offer a better combination of accuracy and efficiency than previous approaches to federated search of text digital libraries in P2P networks. We show in detail that the network search model is not a simple adaptation of existing solutions to full-text ranked retrieval. Its significance lies in our new development for each of the main components (resource representation, resource selection, and result merging) in consideration
doi:10.1145/1273221.1273233 fatcat:lf2ble62arcbvpogfoefryrhn4