Proximity Algorithms for Nearly Doubling Spaces

Lee-Ad Gottlieb, Robert Krauthgamer
2013 SIAM Journal on Discrete Mathematics  
We introduce a new problem in the study of doubling spaces: Given a point set S and a target dimension d * , remove from S the fewest number of points so that the remaining set has doubling dimension at most d * . We present a bicriteria approximation for this problem, and extend this algorithm to solve a group of proximity problems. d * (or equivalently, target doubling constant λ * = 2 d * ). We thus call a data set nearly-doubling if all but a negligible fraction of the points have bounded
more » ... ints have bounded doubling dimension. A solution to this point removal problem yields a contribution in two related areas. The first paradigm, broadly speaking, is outlier detection. In this scenario, the removed points are ignored and only the remaining points are processed. A direct motivation for this model stems from the dimension induced clustering framework of [GHPT05], which given a point set seeks a subset with low intrinsic dimension. Further motivation stems from algorithms which have "slack"; that is, they give guarantees for most but not all of the point set [KRXY07, FM10] . These algorithm can be extended to nearly-doubling data sets by simply ignoring the removed points (i.e. throwing them into the slack). The second paradigm is an original one: Here, both the removed points and the remaining ones are processed, albeit by separate algorithms tailored to the properties of the two point sets. Results. The point removal problem is NP-hard, and it is not difficult to show that the problem does not admit even an approximate multiplicative-factor solution (see Lemma 1). However, we develop a framework that yields a bicriteria approximation for this problem. In Section 3, we present bicriteria algorithms that achieve the following bounds:
doi:10.1137/120874242 fatcat:hzlzfl4iprc5phwa34ciinsohe