Anonymizing Unstructured Data [article]

Rajeev Motwani, Shubha U. Nabar
2008 arXiv   pre-print
In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide O(klogk)
more » ... O(1)-approximation algorithms for the same. We demonstrate applicability of our algorithms to the America Online query log dataset.
arXiv:0810.5582v2 fatcat:przldunth5d3hctxsypvs74ula