t-Spanners for metric space searching

Gonzalo Navarro, Rodrigo Paredes, Edgar Chávez
2007 Data & Knowledge Engineering  
The problem of Proximity Searching in Metric Spaces consists in finding the elements of a set which are close to a given query under some similarity criterion. In this paper we present a new methodology to solve this problem, which uses a t-spanner G ′ (V, E) as the representation of the metric database. A t-spanner is a subgraph G ′ (V, E) of a graph G(V, A), such that E ⊆ A and G ′ approximates the shortest path costs over G within a precision factor t. Our key idea is to regard the t-spanner
more » ... as an approximation to the complete graph of distances among the objects, and to use it as a compact device to simulate the large matrix of distances required by successful search algorithms such as AESA. The t-spanner properties imply that we can use shortest paths over G ′ to estimate any distance with bounded-error factor t. For this sake, several t-spanner construction, updating, and search algorithms are proposed and experimentally evaluated. We show that our technique is competitive against current approaches. For example, in a metric space of documents our search time is only 9% over AESA, yet we need just 4% of its space requirement. Similar results are obtained in other metric spaces. Finally, we conjecture that the essential metric space property to obtain good tspanner performance is the existence of clusters of elements, and enough empirical evidence is given to support this claim. This property holds in most real-world metric spaces, so we expect that t-spanners will display good behavior in most practical applications. Furthermore, we show that t-spanners have a great potential for improvements.
doi:10.1016/j.datak.2007.05.002 fatcat:t73cdcgftjcnvnuawrnxcb35ne