DPWeka: Achieving Differential Privacy in WEKA

Srinidhi Katla, Depeng Xu, Yongkai Wu, Qiuping Pan, Xintao Wu
2017 2017 IEEE Symposium on Privacy-Aware Computing (PAC)  
Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high longterm and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive
more » ... ributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection framework, differential privacy, provides mathematical guarantees against adversarial attacks. Existing platforms for differential privacy employ specific mechanisms for limited applications of data mining. Additionally, widely used data mining tools do not contain differentially private data mining algorithms. As a result, for analyzing sensitive data, the cognizance of differentially private methods is currently limited outside the research community. This thesis examines various mechanisms to realize differential privacy in practice and investigates methods to integrate them with a popular machine learning toolkit, WEKA. We present DPWeka, a package that provides differential privacy capabilities to WEKA, for practical data mining. DPWeka includes a suite of differential privacy preserving algorithms which support a variety of data mining tasks including attribute selection and regression analysis. It has provisions for users to control privacy and model parameters, such as privacy mechanism, privacy budget, and other algorithm specific variables. We evaluate private algorithms on realworld datasets, such as genetic data and census data, to demonstrate the practical applicability of DPWeka. Acknowledgements There are a number of people I want to thank for their help with my thesis. First of all, I would like to express my immense gratitude to my advisor, Dr. Xintao Wu. This project would not have been possible without his support and encouragement. He contributed virtually to all ideas in this thesis. In spite of his overwhelming schedule, he made time to thoroughly review my work, pointing out when I had gone wrong, and when things could be improved. I also knew when I did well, because he let me know. His wisdom, passion, dedication, and attention to detail are awe-inspiring and constantly motivated me to raise my standards. He definitely provided more than any supervisor would. Secondly, I would like to thank my lab-mates, a lot. They are a daily source of reminder that hard work and persistence leads to success. In fact, Shuhan Yuan's and Qiuping Pan's constant question of "Have you finished your thesis?" has been motivation to push through the progress of my thesis. I would like to thank Depeng Xu and Qiuping Pan for all the discussions on differential privacy and genetic database. They certainly made wading through the dense concepts lot easier. As a matter of fact, the general discussions and the regular group meetings with everyone in the lab (Yongkai Wu, Panpan Zheng, Dr. Lu Zhang, Nghia Nguyen) were a constant source of learning.
doi:10.1109/pac.2017.25 dblp:conf/pac/KatlaXWPW17 fatcat:ghi6wu7si5fl5fonqfo57qhpnu