RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce

Aleksandar Stupar, Sebastian Michel, Ralf Schenkel
2010 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping phase of MapReduce. The LSH algorithm assigns similar objects to the same fragments in the distributed
more » ... ile system which enables a effective selection of potential candidate neighbors which get then reduced to the set of K-Nearest Neighbors. We address problems arising due to the different characteristics of MapReduce and LSH to achieve an efficient search process on the one hand and high LSH accuracy on the other hand. We discuss several pitfalls and detailed descriptions on how to circumvent these. We evaluate RankReduce using both synthetic data and a dataset obtained from Flickr.com demonstrating the suitability of the approach. modal Computing and Interaction" (MMCI)
dblp:conf/sigir/StuparMS10 fatcat:fo6xocjhk5bjznlkinossf2pme