Indexing and searching 100M images with map-reduce

Diana Moise, Denis Shestakov, Gylfi Gudmundsson, Laurent Amsaleg
2013 Proceedings of the 3rd ACM conference on International conference on multimedia retrieval - ICMR '13  
Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds. This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using
more » ... achieved using Hadoop, a popular Map-Reducebased framework. Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared. Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior. The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability. We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors. We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions. High-Dimensional Indexing, Map-Reduce, Hadoop. 2. Index search with Map-Reduce. For some applications, throughput is way more important than is the response time of individual queries. We propose a index search scheme that is geared toward throughput as it processes very efficiently large batches of queries. We show this search technique is essentially bounded
doi:10.1145/2461466.2461470 dblp:conf/mir/MoiseSGA13 fatcat:jiaj456w5zgujaqeq22ualewty