A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel [article]

Ni Ding, Parastoo Sadeghi
<span title="2019-02-12">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data X correlated with the sensitive information S, the PF problem is to generate the sanitized data X̂ that maintains a specified utility/fidelity threshold on I(X; X̂) while minimizing the privacy leakage I(S; X̂). Our IAC-MDSF algorithm starts with the original
more &raquo; ... et X̂ := X and iteratively merges the elements in the current alphabet X̂ that minimizes the Lagrangian function I(S;X̂) - λ I(X;X̂) . We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of X̂ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;X̂) vs. - I(X;X̂). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1901.06629v2">arXiv:1901.06629v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/r355dwp75bfz5mrmd7ofn7rlmq">fatcat:r355dwp75bfz5mrmd7ofn7rlmq</a> </span>
