Optimizing Privacy-Accuracy Tradeoff for Privacy Preserving Distance-Based Classification [article]

Dongjin Kim, Zhiyuan Chen, Aryya Gangopadhyay, Maryland Shared Open Access Repository
Privacy concerns often prevent organizations from sharing data for data mining purposes. There has been a rich literature on privacy preserving data mining techniques that can protect privacy and still allow accurate mining. Many such techniques have some parameters that need to be set correctly to achieve the desired balance between privacy protection and quality of mining results. However, there has been little research on how to tune these parameters effectively. This paper studies the
more » ... m of tuning the group size parameter for a popular privacy preserving distance-based mining technique: the condensation method. The contributions include: 1) a class-wise condensation method that selects an appropriate group size based on heuristics and avoids generating groups with mixed classes, 2) a rule-based approach that uses binary search and several rules to further optimize the setting for the group size parameter. The experimental results demonstrate the effectiveness of the authors' approach.
doi:10.13016/m2jng5-waqt fatcat:wu4v2v35tve3vp2zikh7szuwd4