On unstructured distributed search over BitTorrent

William Mayor, Ingemar Cox
2013 IEEE P2P 2013 Proceedings  
Current BitTorrent tracking data discovery methods rely on either centralised systems or structured peerto-peer (P2P) networks. These methods present security weaknesses that can be exploited in order to censor or remove information from the network. To alleviate this threat, we propose incorporating an unstructured peer-to-peer information discovery mechanism that can be used in the event that the centralised or structured P2P mechanisms are compromised. Unstructured P2P information discovery
more » ... as fewer security weaknesses. However, in this case, the performance of the search is both probabilistic and approximately correct (PAC) since it is not practical to perform an exhaustive search. The performance of PAC search strongly depends on the distribution of documents in the network. To determine the practicality of PAC search over BitTorrent, we first conducted a 64 day study of BitTorrent activities, looking at the distribution of 1.6 million torrents on 5.4 million peers. We found that the distribution of torrents follows a power law which is not amenable to PAC search. To address this, we introduce a simple modification to BitTorrent which enables each peer to index a random subset of tracking data, i.e. torrent ID and list of participating nodes. A successful search is then one that finds a peer with tracking data, rather than a peer directly participating in the torrent. The distribution of this tracking data is shown to be capable of supporting an accurate PAC search for torrents. We assess the overheads intro-duced by our extension and conclude that we would require small amounts of bandwidth, that are easily provided by current home broadband capabilities. We also simulate our extension to verify our model and to explore our extension's capabilities in different situations. We demonstrate that our extension can satisfy 99% of queries for popular torrents, as well as discover torrents found on as few as 10 nodes in 5 million after only 8 repeated queries to 100 random nodes.
doi:10.1109/p2p.2013.6688715 dblp:conf/p2p/MayorC13 fatcat:o4tmhtksjnbllbdn7z446m5coa