Filters








2,272 Hits in 7.0 sec

On the geometry of similarity search: dimensionality curse and concentration of measure [article]

Vladimir Pestov
1999 arXiv   pre-print
We suggest that the curse of dimensionality affecting the similarity-based search in large datasets is a manifestation of the phenomenon of concentration of measure on high-dimensional structures.  ...  We prove that, under certain geometric assumptions on the query domain Ω and the dataset X, if Ω satisfies the so-called concentration property, then for most query points x^∗ the ball of radius (1+)d_X  ...  my visit to the University of Bologna in June 1998.  ... 
arXiv:cs/9901004v1 fatcat:cpu4ucdpq5akvnznopagxxi5ia

A geometric framework for modelling similarity search

V. Pestov
1999 Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99  
They include such notions as metric transform, -entropy, and the phenomenon of concentration of measure on high-dimensional structures.  ...  issues as analysis of complexity, indexability, and the 'curse of dimensionality.'  ...  This is certainly the case with the phenomenon of concentration of measure on high-dimensional structures, which might potentially have the greatest impact of all on both theory and practice of similarity  ... 
doi:10.1109/dexa.1999.795158 dblp:conf/dexaw/Pestov99 fatcat:ldwfkc7l4zddpmfabgl7dztxyq

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions [article]

Vladimir Pestov
2012 arXiv   pre-print
Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees.  ...  We deduce the Ω(n^1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω,X).  ...  [30] ), even in cases commonly accepted as "high-dimensional" (e.g. uniformly distributed data in the Hamming cube {0, 1} d as d → ∞), the "curse of dimensionality conjecture" for proximity search remains  ... 
arXiv:0812.0146v4 fatcat:rqe3petvhzbgxi7omut4vwzpem

Indexability, concentration, and VC theory

Vladimir Pestov
2010 Proceedings of the Third International Conference on SImilarity Search and APplications - SISAP '10  
We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.  ...  Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated  ...  Approximate NN search and dimensionality reduction Approximate nearest neighbour search [39] is often said to be free from the curse of dimensionality, and the reason is that the (dimensionality) reduction  ... 
doi:10.1145/1862344.1862346 dblp:conf/sisap/Pestov10 fatcat:wke63p6rjzghtgcgeupn2uhhmm

Embedded Map Projection for Dimensionality Reduction-Based Similarity Search [chapter]

Simone Marinai, Emanuele Marino, Giovanni Soda
2008 Lecture Notes in Computer Science  
The dimensionality reduction is used in a similarity search framework whose aim is to efficiently retrieve similar objects on the basis of the Euclidean distance among high dimensional feature vectors  ...  In this paper we compare the proposed method with other dimensionality reduction techniques evaluating the retrieval performance on three data-sets.  ...  The method has been compared with other dimensionality reduction methods on a query by example retrieval application on three datasets.  ... 
doi:10.1007/978-3-540-89689-0_62 fatcat:ryeyw6x43vbjxag2j22kxnzpgy

Dimensionality reduction for similarity searching in dynamic databases

K. V. Ravi Kanth, Divyakant Agrawal, Ambuj Singh
1998 Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98  
This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data.  ...  These techniques reduce the computation time by a factor of 20 in experiments on color and texture image vectors. The error due to approximate computation of SVD is less than 10%.  ...  Searching in multiple dimensions has been extensively researched in the database and computational geometry literature.  ... 
doi:10.1145/276304.276320 dblp:conf/sigmod/KanthAS98 fatcat:tlgv5xput5c5jgexfy7s5pp44u

Dimensionality Reduction for Similarity Searching in Dynamic Databases

K.V. Ravi Kanth, Divyakant Agrawal, Amr El Abbadi, Ambuj Singh
1999 Computer Vision and Image Understanding  
This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data.  ...  These techniques reduce the computation time by a factor of 20 in experiments on color and texture image vectors. The error due to approximate computation of SVD is less than 10%.  ...  Searching in multiple dimensions has been extensively researched in the database and computational geometry literature.  ... 
doi:10.1006/cviu.1999.0762 fatcat:orevfr2wfzhbvoppqnbqkiqnna

Dimensionality reduction for similarity searching in dynamic databases

K. V. Ravi Kanth, Divyakant Agrawal, Ambuj Singh
1998 SIGMOD record  
This phenomenon, generally referred to as the dimensionality curse, can be circumvented by reducing the dimensionality of the data.  ...  These techniques reduce the computation time by a factor of 20 in experiments on color and texture image vectors. The error due to approximate computation of SVD is less than 10%.  ...  Searching in multiple dimensions has been extensively researched in the database and computational geometry literature.  ... 
doi:10.1145/276305.276320 fatcat:hrettisonndlreq2zd5rpfib6u

Feature-based similarity search in 3D object databases

Benjamin Bustos, Daniel A. Keim, Dietmar Saupe, Tobias Schreck, Dejan V. Vranić
2005 ACM Computing Surveys  
Over the last few years, a strong interest in methods for 3D similarity search has arisen, and a growing number of competing algorithms for content-based retrieval of 3D objects have been proposed.  ...  In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices.  ...  ACKNOWLEDGMENTS We thank the editors of ACM Computing Surveys and the anonymous referees for their helpful comments on the earlier version of this article.  ... 
doi:10.1145/1118890.1118893 fatcat:qci4gin7j5buzezfetwvbu7oei

Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments [article]

Aleksandar Stojmirovic, Vladimir Pestov
2007 arXiv   pre-print
Our scheme is based on the internal geometry of the amino acid alphabet and performs exceptionally well, for example outputting 100 nearest neighbours to any possible fragment of length 10 after scanning  ...  This type of similarity search has importance in both providing a building block to more complex algorithms and for possible use in direct biological investigations where datasets are of the order of 60  ...  The first named author (A.S.) was also supported by a Bright Future PhD scholarship awarded by the NZ Foundation for Research, Science and Technology jointly with the Fonterra Research Centre and by Victoria  ... 
arXiv:cs/0309005v4 fatcat:sm742ecd2nh5pcimxmbacqf524

Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets [article]

Aleksandar Stojmirovic
2008 arXiv   pre-print
Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.  ...  Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics.  ...  Acknowledgements I am indebted to many people and institutions who have helped me to survive and even enjoy the four years it took to produce this thesis.  ... 
arXiv:0810.5407v1 fatcat:k2mb4lcdebepzid4wtjvf24t2i

Spectral Approaches to Nearest Neighbor Search

Amirali Abdullah, Alexandr Andoni, Ravindran Kannan, Robert Krauthgamer
2014 2014 IEEE 55th Annual Symposium on Foundations of Computer Science  
We design spectral NNS algorithms whose query time depends polynomially on the dimension and logarithmically on the size of the point set.  ...  The full version of this extended abstract is available on arXiv.  ...  ACKNOWLEDGMENT Amirali Abdullah conducted much of this work at Microsoft Research Silicon Valley and was also supported in part by the NSF grant #CCF-0953066.  ... 
doi:10.1109/focs.2014.68 dblp:conf/focs/AbdullahAKK14 fatcat:dkh7hcegmvdatpb74rclslbtou

t-Spanners for metric space searching

Gonzalo Navarro, Rodrigo Paredes, Edgar Chávez
2007 Data & Knowledge Engineering  
The problem of Proximity Searching in Metric Spaces consists in finding the elements of a set which are close to a given query under some similarity criterion.  ...  For example, in a metric space of documents our search time is only 9% over AESA, yet we need just 4% of its space requirement. Similar results are obtained in other metric spaces.  ...  The measure of similarity used is related to the probability of mutations such as reversals of pieces of the sequences and other rearrangements (global similarity), or variants of edit distance (local  ... 
doi:10.1016/j.datak.2007.05.002 fatcat:t73cdcgftjcnvnuawrnxcb35ne

Similarity Search on Time Series Based on Threshold Queries [chapter]

Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Peter Kunath, Alexey Pryakhin, Matthias Renz
2006 Lecture Notes in Computer Science  
The most prominent work has focused on similarity search considering either complete time series or similarity according to subsequences of time series.  ...  The performance of our solution is demonstrated by an extensive experimental evaluation on real world and artificial time series data.  ...  One time series represents the measurement of one station at a given day containing 48 values for one of 10 different parameters such as temperature, ozone concentration, etc.  ... 
doi:10.1007/11687238_19 fatcat:stgmlu5nzncs7pa3vxgydfcjza

Bio-Inspired Hashing for Unsupervised Similarity Search [article]

Chaitanya K. Ryali, John J. Hopfield, Leopold Grinberg, Dmitry Krotov
2020 arXiv   pre-print
From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.  ...  Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash  ...  is mapped to an even higher dimensional secondary rep- Similarity search and LSH. In similarity search, given resentation.  ... 
arXiv:2001.04907v2 fatcat:2nldzpujijfb5djfxmrvemd3ja
« Previous Showing results 1 — 15 out of 2,272 results