Data Sharing Taxonomy Records for Security Conservation

Rajeswari Chandrasekaran, Chandrasekaran Nammalwar
2017 Computer Science & Information Technology (CS & IT)   unpublished
Here, we discuss the Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of non identifying attributes such as
more » ... {Sex,Zip,Birthdate}. A useful approach to combat such linking attacks, called k-anonymization is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Our goal is to find a k-anonymization which preserves the classification structure. Experiments of real-life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements.
doi:10.5121/csit.2017.70205 fatcat:4lyl3l3iojhr7krzsjxcffx6nu