Attribute association based privacy preservation for multi trust level environment

R PRAVEENA PRIYADARSINI, M L VALARMATHI, S SIVAKUMARI
2015 Sadhana (Bangalore)  
Enormous amount of e-data is collected world-wide by organizations for the purpose of their research and decision making. The availability of this heterogeneous, sensitive information in e-databases poses a threat to the privacy of the individual or organization on which the data is collected. Privacy Preserving Data Mining [PPDM] is a field of research which concentrates on preserving data privacy during the process of data mining. This paper proposes a two level partition and perturbation
more » ... e work to release multiple copies of privacy preserved datasets in Multi Trust Level [MTL] scenario that can prevent linking and diversity attack. The framework proposes two methods namely, Entropy based Attribute Privacy Preservation [EAPP] and Information Gain based Attribute Privacy Preservation [IGAPP] for privacy preservation in MTL environment. The two methods perform vertical and horizontal partitioning of data for privacy preservation. Simple K-Means clustering algorithm with cluster size 2 using both Euclidean and Manhattan distance functions are used for horizontal partitioning. The vertical partitioning of attributes within the cluster is performed based on their entropy value that indicates its one way association with its class in EAPP method and Information Gain [IG] value of the attributes that indicates the two way associations with class in IGAPP method. The attributes in the clusters are subjected to privacy preservation technique based on their entropy and IG values in EAPP and IGAPP methods, respectively. The effect of distance in clustering the data points on privacy preservation and the ability of the privacy preserved datasets generated using the proposed methods to prevent privacy attacks are studied using variance, rank distortion and utility metrics. Real life medical and bench mark adult data sets have been used here for experimentation. The results show that the generated datasets exhibit good variance and rank distortion values and hence can * For correspondence R Praveena Priyadarsini et al prevent diversity and linking attacks in MTL environment. Also, the privacy preserved datasets have comparable utility on selected classification and clustering algorithms with original and L-Diversified datasets.
doi:10.1007/s12046-015-0412-4 fatcat:ah4tav66cfadlgnzqxoqa7h7wq