Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems
IEEE Transactions on Parallel and Distributed Systems
Distributed peer-to-peer storage systems rely on voluntary participation of peers to effectively manage a storage pool. Files are generally replicated in several sites to provide acceptable levels of availability. If disk space on these peers is not carefully monitored and provisioned, the system may not be able to provide availability for certain files. In particular, identification and elimination of redundant data are important problems that may arise in long-lived systems. Scalability and
... ailability are competing goals in these networks: scalability concerns would dictate aggressive elimination of replicas, while availability considerations would argue conversely. In this paper, we provide a novel and efficient solution that addresses both these goals with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peerto-peer network in which there is no a priori binding between an object and its location. We propose a new randomized protocol to solve this problem in a scalable and decentralized fashion that does not compromise availability requirements of the application. Performance results using both large-scale simulations, and a prototype built on Plan-etLab, demonstrate that the protocols provide high probabilistic guarantees of success, while incurring minimal administrative overheads.