Anonymity-preserving data collection

Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
Protection of privacy has become an important problem in data mining. In particular, individuals have become increasingly unwilling to share their data, frequently resulting in individuals either refusing to share their data or providing incorrect data. In turn, such problems in data collection can affect the success of data mining, which relies on sufficient amounts of accurate data in order to produce meaningful results. Random perturbation and randomized response techniques can provide some
more » ... evel of privacy in data collection, but they have an associated cost in accuracy. Cryptographic privacy-preserving data mining methods provide good privacy and accuracy properties. However, in order to be efficient, those solutions must be tailored to specific mining tasks, thereby losing generality. In this paper, we propose efficient cryptographic techniques for online data collection in which data from a large number of respondents is collected anonymously, without the help of a trusted third party. That is, our solution allows the miner to collect the original data from each respondent, but in such a way that the miner cannot link a respondent's data to the respondent. An advantage of such a solution is that, because it does not change the actual data, its success does not depend on the underlying data mining problem. We provide proofs of the correctness and privacy of our solution, as well as experimental data that demonstrates its efficiency. We also extend our solution to tolerate certain kinds of malicious behavior of the participants.
doi:10.1145/1081870.1081909 dblp:conf/kdd/YangZW05 fatcat:ft5g7tjcufct3kub3bxhlazshi