Privacy Preserving Data Mining Technique and Their Implementation
International Journal of Research Studies in Computer Science and Engineering
Privacy preservation in data mining has gained significant recognition because of the increased concerns to ensure privacy of sensitive information. It enables multiple parties to conduct collaborative data mining while preserving the privacy of their data. In this work, a cloud computing based protocol for privacypreserving distributed K-means clustering over horizontally partitioned data, shared between N parties, is proposed. Clustering is one of the elementary algorithms used in the field
... data mining. Traditional cryptographic methods use encryption techniques or secure multiparty computation (SMC) to ensure privacy of data. But privacy in these techniques is at the expense of additional communication cost, which limits their use in practical applications. Hence, to reduce these overheads, threshold cryptography is used in the proposed work as a privacy-preserving mechanism. The proposed scheme is faster as compared to the previous schemes and experimental results presented in this paper. The first category takes the data and replaces it from the same distribution sample or from the distribution itself, i.e., probability distribution approach, or by adding noise, i.e., value distortion approach. The latter category makes use of cryptographic protocols. The solution should not just be secure, i.e., it leaks, no additional useful information, but should also minimize the additional overheads in terms of communication and computation costs required to introduce privacy. The schemes using randomization techniques achieve partial privacy, but the advantage is that the communication costs are negligible. The schemes using cryptographic techniques,like Homomorphic Encryption and SMC, provide complete privacy but are computationally expensive. Hence, it is required to explore other techniques that have lesser overheads. The work is focused on giving a privacy-preserving distributed K-means clustering algorithm which uses secret sharing paradigm to provide privacy. Methods using the secret sharing paradigm have very less communication overhead as compared to the traditional cryptographic techniques and hence are faster. A secret sharing scheme allows the splitting of a secret s into different pieces, which are called shares. These shares are distributed among a set of players or participants, denoted by P, such that only certain subsets of players can recover the secret using their respective shares.