Privacy Preserving Count Statistics [article]

Lu Yu and Oluwakemi Hambolu and Yu Fu and Jon Oakley and Richard R. Brooks
2019 arXiv   pre-print
The ability to preserve user privacy and anonymity is important. One of the safest ways to maintain privacy is to avoid storing personally identifiable information (PII), which poses a challenge for maintaining useful user statistics. Probabilistic counting has been used to find the cardinality of a multiset when precise counting is too resource intensive. In this paper, probabilistic counting is used as an anonymization technique that provides a reliable estimate of the number of unique users.
more » ... We extend previous work in probabilistic counting by considering its use for preserving user anonymity, developing application guidelines and including hash collisions in the estimate. Our work complements previous method by attempting to explore the causes of the deviation of uncorrected estimate from the real value. The experimental results show that if the proper register size is used, collision compensation provides estimates are as good as, if not better than, the original probabilistic counting. We develop a new anonymity metric to precisely quantify the degree of anonymity the algorithm provides.
arXiv:1910.07020v1 fatcat:jwxqpbygt5havh5uwptq22q6my