Cache-conscious performance optimization for similarity search

Maha Alabduljalil, Xun Tang, Tao Yang
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
All-pairs similarity search can be implemented in two stages. The first stage is to partition the data and group potentially similar vectors. The second stage is to run a set of tasks where each task compares a partition of vectors with other candidate partitions. Because of data sparsity, accessing feature vectors in memory for runtime comparison in the second stage, incurs significant overhead due to the presence of memory hierarchy. This paper proposes a cache-conscious data layout and
more » ... sal optimization to reduce the execution time through size-controlled data splitting and vector coalescing. It also provides an analysis to guide the optimal choice for the parameter setting. Our evaluation with several application datasets verifies the performance gains obtained by the optimization and shows that the proposed scheme is upto 2.74x as fast as the cache-oblivious baseline.
doi:10.1145/2484028.2484077 dblp:conf/sigir/AlabduljalilTY13 fatcat:ptk7klfakbhbnpqwzdmdcs2wxi