Accelerating Large-scale Image Retrieval on Heterogeneous Architectures with Spark

Hanli Wang, Bo Xiao, Lei Wang, Jun Wu
2015 Proceedings of the 23rd ACM international conference on Multimedia - MM '15  
Apache Spark is a general-purpose cluster computing system for big data processing and has drawn much attention recently from several fields, such as pattern recognition, machine learning and so on. Unlike MapReduce, Spark is especially suitable for iterative and interactive computations. With the computing power of Spark, a utility library, referred to as IRlib, is proposed in this work to accelerate large-scale image retrieval applications by jointly harnessing the power of GPU. Similar to
more » ... built-in machine learning library of Spark, namely MLlib, IRlib fits into the Spark APIs and benefits from the powerful functionalities of Spark. The main contributions of IRlib lie in two-folds. First, IRlib provides a uniform set of APIs for the programming of image retrieval applications. Second, the computational performance of Spark equipped with multiple GPUs is dramatically boosted by developing high performance modules for common image retrieval related algorithms. Comparative experiments concerning large-scale image retrieval are carried out to demonstrate the significant performance improvement achieved by IRlib as compared with single CPU thread implementation as well as Spark without GPUs employed.
doi:10.1145/2733373.2806392 dblp:conf/mm/WangXWW15 fatcat:r73v75iglffqpbeiej34aei7xi