LazyLSH

Yuxin Zheng, Qi Guo, Anthony K.H. Tung, Sai Wu
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in highdimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the 1 and 2 spaces, while as shown in previous work, the fractional distance metrics ( p metrics with 0 < p < 1) can provide more insightful results than the usual 1 and
more » ... metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple p metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple p spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics. CCS Concepts •Information systems → Nearest-neighbor search; Keywords Locality sensitive hashing; Nearest neighbor search; p metrics p metric is application-dependent and required to be tuned or adjusted for each application [1, 16, 25, 20] . As an example, Table 1 shows the accuracy of the kNN classifier [17] under different p metrics. We test Mnist [29], Sun [19] and seven datasets from the UCI ML repository 1 . The ground-truth classification results are provided by the datasets themselves. For each query point, we retrieve its 1 http://archive.ics.uci.edu/ml/ The used datasets are: Ionosphere (Ionos), Musk, Breast Cancer Wisconsin (BCW), Statlog Vehicle Silhouettes (SVS), Segmentation (Segme), Gisette (Giset) and Statlog Landsat Satellite (SLS).
doi:10.1145/2882903.2882930 dblp:conf/sigmod/ZhengGTW16 fatcat:l5eispcnzvfkllq4l2jkygaqaq