A Study of Privatized Synthetic Data Generation Using Discrete Cosine Transforms

Kato Mivule
2014 International Journal of Advanced Computer Science and Applications  
In order to comply with data confidentiality requirements, while meeting usability needs for researchers, entities are faced with the challenge of how to publish privatized data sets that preserve the statistical traits of the original data. One solution to this problem, is the generation of privatized synthetic data sets. However, during data privatization process, the usefulness of data, have a propensity to diminish even as privacy might be guaranteed. Furthermore, researchers have
more » ... that finding an equilibrium between privacy and utility is intractable, often requiring trade-offs. Therefore, as a contribution, the Filtered Classification Error Gauge heuristic, is presented. The suggested heuristic is a data privacy and usability model that employs data privacy, signal processing, and machine learning techniques to generate privatized synthetic data sets with acceptable levels of usability. Preliminary results from this study show that it might be possible to generate privacy compliant synthetic data sets using a combination of data privacy, signal processing, and machine learning techniques, while preserving acceptable levels of data usability.
doi:10.14569/ijacsa.2014.051107 fatcat:stn2vr5g4bestbivkgcax7pcpy