A Survey on Nearest Neighbor Search Methods

Mohammad RezaAbbasifard, Bijan Ghahremani, Hassan Naderi
2014 International Journal of Computer Applications  
Nowadays, the need to techniques, approaches, and algorithms to search on data is increased due to improvements in computer science and increasing amount of information. This ever increasing information volume has led to time and computation complexity. Recently, different methods to solve such problems are proposed. Among the others, nearest neighbor search is one of the best techniques to this end which is focused by many researchers. Different techniques are used for nearest neighbor search.
more » ... In addition to put an end to some complexities, variety of these techniques has made them suitable for different applications such as pattern recognition, searching in multimedia data, information retrieval, databases, data mining, and computational geometry to name but a few. In this paper, by opening a new view to this problem, a comprehensive evaluation on structures, techniques and different algorithms in this field is done and a new categorization of techniques in NNS is presented. This categorization is consists of seven groups: Weighted, Reductional, Additive, Reverse, Continuous, Principal Axis and Other techniques which are studied, evaluated and compared in this paper. Complexity of used structures, techniques and their algorithms are discussed, as well. Keywords Data Structure, kNN Algorithm, Nearest Neighbor Search, Query Processing Definition 2. (Exact NNS): Given a set S of points in a ddimensional space , construct a data structure which given any querypoint finds the point in S with the smallest distance to q [2, 14] . This definition for a small dataset with low dimension has sub linear (or even logarithmic) query time, but for massive dataset with high dimension is exponential [2] . Fortunately, approximation can decrease the exponential complexity into polynomial time. Approximate NNS is defined as: Definition 3. (Approximate nearest neighbor): Given a set S of Points in a d-dimensional space , construct a data structure which given any query point , reports any point within distance at most c times the distance from q top, where p is the point in P closest to q [2] .
doi:10.5120/16754-7073 fatcat:465jmoyauzabrle4u62l6xld5a