Attack-resistant frequency counting
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
We present collaborative peer-to-peer algorithms for the problem of approximating frequency counts for popular items distributed across the peers of a large-scale network. Our algorithms are attack-resistant in the sense that they function correctly even in the case where an adaptive and computationally unbounded adversary causes up to a 1/3 fraction of the peers in the network to suffer Byzantine faults. Our algorithms are scalable in the sense that all resource costs are polylogarithmic.
... lylogarithmic. Specifically, latency is O(log n); the number of messages and number of bits sent and received by each peer is O(log 2 n) per item; and number of neighbors of each peer is O(log 2 n). Our motivation for addressing this problem is to provide a tool for the following three applications: worm and virus detection; spam detection; and distributed data-mining. To the best of our knowledge, our algorithms are the first attack-resistant and scalable algorithms for this problem. Moreover, surprisingly, our algorithms seem to be the first attack-resistant algorithms for any data mining problem. Both Earlybird and Autograph make use of Rabin fingerprinting and the technique of breaking flow payloads into smaller strings, and are thus able to generate signatures for a large class of polymorphic worms, with few false positives and few false negatives. 2 These fingerprints are robust in the sense that they can be used to identify messages that are slight variants of each other.