Efficient anonymity-preserving data collection
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06
The output of a data mining algorithm is only as good as its inputs, and individuals are often unwilling to provide accurate data about sensitive topics such as medical history and personal finance. Individuals may be willing to share their data, but only if they are assured that it will be used in an aggregate study and that it cannot be linked back to them. Protocols for anonymity-preserving data collection provide this assurance, in the absence of trusted parties, by allowing a set of
... y distrustful respondents to anonymously contribute data to an untrusted data miner. To effectively provide anonymity, a data collection protocol must be collusion resistant, which means that even if all dishonest respondents collude with a dishonest data miner in an attempt to learn the associations between honest respondents and their responses, they will be unable to do so. To achieve collusion resistance, previously proposed protocols for anonymity-preserving data collection have quadratically many communication rounds in the number of respondents, and employ (sometimes incorrectly) complicated cryptographic techniques such as zero-knowledge proofs. We describe a new protocol for anonymity-preserving, collusion resistant data collection. Our protocol has linearly many communication rounds, and achieves collusion resistance without relying on zero-knowledge proofs. This makes it especially suitable for data mining scenarios with a large number of respondents.