Publishing spatial histograms under differential privacy

Soheila Ghane, Lars Kulik, Kotagiri Ramamohanarao
2018 Proceedings of the 30th International Conference on Scientific and Statistical Database Management - SSDBM '18  
Counting the fraction of a population having an input within a specified interval i.e. a range query, is a fundamental data analysis primitive. Range queries can also be used to compute other core statistics such as quantiles, and to build prediction models. However, frequently the data is subject to privacy concerns when it is drawn from individuals, and relates for example to their financial, health, religious or political status. In this paper, we introduce and analyze methods to support
more » ... hods to support range queries under the local variant of differential privacy [23] , an emerging standard for privacy-preserving data analysis. The local model requires that each user releases a noisy view of her private data under a privacy guarantee. While many works address the problem of range queries in the trusted aggregator setting, this problem has not been addressed specifically under untrusted aggregation (local DP) model even though many primitives have been developed recently for estimating a discrete distribution. We describe and analyze two classes of approaches for range queries, based on hierarchical histograms and the Haar wavelet transform. We show that both have strong theoretical accuracy guarantees on variance. In practice, both methods are fast and require minimal computation and communication resources. Our experiments show that the wavelet approach is most accurate in high privacy settings, while the hierarchical approach dominates for weaker privacy requirements.
doi:10.1145/3221269.3223039 dblp:conf/ssdbm/GhaneKR18 fatcat:o7ramlijgvf5lotcgjqrcisecq