On the geometry of similarity search: dimensionality curse and concentration of measure [article]

Vladimir Pestov
1999 arXiv   pre-print
We suggest that the curse of dimensionality affecting the similarity-based search in large datasets is a manifestation of the phenomenon of concentration of measure on high-dimensional structures. We prove that, under certain geometric assumptions on the query domain Ω and the dataset X, if Ω satisfies the so-called concentration property, then for most query points x^∗ the ball of radius (1+)d_X(x^∗) centred at x^∗ contains either all points of X or else at least C_1(-C_2^2n) of them. Here
more » ... x^∗) is the distance from x^∗ to the nearest neighbour in X and n is the dimension of Ω.
arXiv:cs/9901004v1 fatcat:cpu4ucdpq5akvnznopagxxi5ia