1,204 Hits in 6.6 sec

Scalable Execution of KNN Queries using Data Parallelism Approach

Kalpana V. Metre, M U. Kharat
2018 International Journal of Engineering & Technology  
The K-nearest neighbor algorithm (KNN) is a well-known learning method used in a wide range of problem-solving domains e.g., network monitoring, data mining, and image processing etc.  ...  A lot of work has been done to deal with the computational complications in constant processing of continuous queries on unbounded, continuous data stream.  ...  This finite index on queries can be accommodated in memory which results in efficient execution of queries avoiding memory access frequently.  ... 
doi:10.14419/ijet.v7i4.19.28286 fatcat:27xnmqoyzfg7vovqjyht25e4ku

Hybrid KNN-Join: Parallel Nearest Neighbor Searches Exploiting CPU and GPU Architectural Features [article]

Michael Gowanlock
2020 arXiv   pre-print
K Nearest Neighbor (KNN) joins are used in scientific domains for data analysis, and are building blocks of several well-known algorithms. KNN-joins find the KNN of all points in a dataset.  ...  This paper focuses on a hybrid CPU/GPU approach for low-dimensional KNN-joins, where the GPU may not yield substantial performance gains over parallel CPU algorithms.  ...  In this work, we focus on exact KNN searches in low dimensionality. The performance of low dimensional KNN searches is limited by the memory bottleneck.  ... 
arXiv:1810.04758v2 fatcat:t4t44mwcfzbfdm7uds5uvz45ey

Scaling k-Nearest Neighbours Queries (The Right Way)

Atoshum Cahsai, Nikos Ntarmos, Christos Anagnostopoulos, Peter Triantafillou
2017 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)  
Recently parallel / distributed processing approaches have been proposed for processing k-Nearest Neighbours (kNN) queries over very large (multi-dimensional) datasets aiming to ensure scalability.  ...  Overall, kNN queries can be processed in just tens of milliseconds (as opposed to the (tens of) seconds required by state of the art.  ...  Briefly, executing kNN queries in this way is very costly in terms of query response times, memory usage, cpu usage, and network and disk bandwidth.  ... 
doi:10.1109/icdcs.2017.267 dblp:conf/icdcs/CahsaiNAT17 fatcat:uvkn65obrnhjrdiovg2kfejhou

GGNN: Graph-based GPU Nearest Neighbor Search [article]

Fabian Groh, Lukas Ruppert, Patrick Wieschollek, Hendrik P.A. Lensch
2021 arXiv   pre-print
Approximate nearest neighbor (ANN) search in high dimensions is an integral part of several computer vision systems and gains importance in deep learning with explicit memory representations.  ...  In this paper, we propose a novel search structure based on nearest neighbor graphs and information propagation on graphs.  ...  This bottomup construction creates a robust searchable kNN-graph for each merged tree. It can be parallelized on multiple GPUs to even support datasets with large memory requirements.  ... 
arXiv:1912.01059v3 fatcat:zbewjskznrhexkvt2zc6vacnqy

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

Mingjie Tang, Yongyang Yu, Ahmed R. Mahmood, Qutaibah M. Malluhi, Mourad Ouzzani, Walid G. Aref
2020 Frontiers in Big Data  
Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node.  ...  The scheduler generates query execution plans that minimize the effect of query skew.  ...  WA also contributed to the design and analysis of the manuscript, in addition to the contribution of writing the manuscript.  ... 
doi:10.3389/fdata.2020.00030 pmid:33693403 pmcid:PMC7931877 fatcat:onodyye4uzb4letmbpyembq7je

SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data

Zaher Al Aghbari, Tasneem Ismail, Ibrahim Kamel
2020 Data Science Journal  
To fill this gap, this paper proposes SparkNN, an in-memory partitioning and indexing system for answering spatial queries, such as K-nearest neighbor, on big spatial data.  ...  SparkNN is implemented on top of Apache Spark and consists of three layers to facilitate efficient spatial queries.  ...  Note that it is possible that the query may run in parallel in more than one partition depending on the location of the query point q and value of k.  ... 
doi:10.5334/dsj-2020-035 fatcat:3z7ftetwarhe5jn5lqqezi6t6e

Application-Driven Near-Data Processing for Similarity Search [article]

Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin
2017 arXiv   pre-print
At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort.  ...  However, kNN is poorly supported by today's architectures because of its high memory bandwidth requirements.  ...  Queries which traverse the index and end up in the same bucket should be similar; multiple parallel trees are often used in parallel with different cut orders.  ... 
arXiv:1606.03742v2 fatcat:tgyyr4avubbzjmr7pz7obiqmle

Similarity Search on Automata Processors [article]

Vincent T. Lee, Justin Kotalik, Carlo C. Del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin
2017 arXiv   pre-print
At its core, similarity search is implemented using the k-nearest neighbors (kNN) algorithm, where computation consists of highly parallel distance calculations and a global top-k sort.  ...  In this paper, we present and evaluate a novel automata-based algorithm for kNN on the Micron Automata Processor (AP), which is a non-von Neumann near-data processing architecture.  ...  This work was also supported in part by NSF under grant CCF-1518703, gifts by Oracle, and by C-FAR, one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.  ... 
arXiv:1608.03175v2 fatcat:gqtchulalnea3mdasisfqz7sgq

Hybrid Indexing for Parallel Analysis of Spatiotemporal Point Patterns

Alexander Hohl, Irene Casas, Eric Delmelle, Wenwu Tang
2016 International Conference on GIScience Short Paper Proceedings  
We perform adaptive octree decomposition of the spatiotemporal domain and build local k-d trees to accelerate nearest neighbour search for space-time kernel density estimation (STKDE).  ...  Our parallel implementation reaches substantial speedup compared to sequential processing. The hybrid index outperforms octree decomposition alone, especially at lower-levels of parallelization.  ...  Hering (2013) showed that performance of in-memory k-d trees is best for intermediate number of dimensions (6-13).  ... 
doi:10.21433/b3114824r3wg fatcat:nqs4jzffhjhpjgcqdbiitsqr5u

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, Pablo Moscato, Alexandre G. de Brevern
2012 PLoS ONE  
Results: We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition  ...  Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs.  ...  Manuel Ujaldón, for his constructive feedbacks on an earlier version of this manuscript. Author Contributions Conceived and designed the experiments: ASA CR PM. Performed the experiments: ASA.  ... 
doi:10.1371/journal.pone.0044000 pmid:22937144 pmcid:PMC3429408 fatcat:vpm23ylkhjcnjlbwhbeuw32aka

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings [article]

Yifan Wang
2022 arXiv   pre-print
In this survey, we first provide an overview of the "similarity query" and "similarity query processing" problems.  ...  Similarity query is the family of queries based on some similarity metrics.  ...  There are also studies of KNN join on top of other types of indexes, e.g., distributed KNN join based on tree index [21] , distributed KNN join on parallel product quantization [29] , localized KNN join  ... 
arXiv:2204.07922v1 fatcat:u5osyghs6vgppnj5gpnrzhae5y


Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Simba is based on Spark and runs over a cluster of commodity machines.  ...  We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data.  ...  In parallel, on each combined RDD partition, Simba builds an R-tree over Si and executes a local kNN join by querying each record from Ri over this tree.  ... 
doi:10.1145/2882903.2915237 dblp:conf/sigmod/XieL0LZG16 fatcat:kkus3fprcjevle5qxw7zaw2wtq

A Hardware Processing Unit for Point Sets [article]

Simon Heinzle, Gaël Guennebaud, Mario Botsch, Markus Gross
2008 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04  
A key component of our design is the spatial search unit based on a kd-tree performing both kNN and eN searches.  ...  Our design is focused on fundamental and computationally expensive operations on point sets including k-nearest neighbors search, moving least squares approximation, and others.  ...  While this algorithm is a highly sequential operation, we can identify three main blocks to be executed in parallel, due to their independence in terms of memory access.  ... 
doi:10.2312/eggh/eggh08/021-031 fatcat:n3epuivc45csrop227lqubkqkm

Accelerating Exact Similarity Search on CPU-GPU Systems

Takazumi Matsumoto, Man Lung Yiu
2015 2015 IEEE International Conference on Data Mining  
Similarity search, also known as k-nearest neighbor search, is a key part of data mining applications and is used also extensively in applications such as multimedia search, where only a small subset of  ...  In recent years, the use of Graphics Processing Units (GPUs) for data mining tasks has become popular.  ...  There are four key points to such a system: Fig. 3 : 3 Examples of execution on 6 data elements memory and global memory.  ... 
doi:10.1109/icdm.2015.125 dblp:conf/icdm/MatsumotoY15 fatcat:i5rzgrm2cfek5jdudloi4ndt6q

How good are modern spatial analytics systems?

Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper
2018 Proceedings of the VLDB Endowment  
In this work, we first explore the available modern spatial processing systems and then thoroughly compare them based on features and queries they support, using real-world datasets.  ...  In recent years a lot of spatial analytics systems have emerged. Existing work compares either limited features of these systems or the studies are outdated since new systems have emerged.  ...  memory is the maximum amount of memory used at any point in time for execution of a query.  ... 
doi:10.14778/3236187.3236213 fatcat:f7ujehz35ra7xljqdiwwd2hs5q
« Previous Showing results 1 — 15 out of 1,204 results