Ertem Tuncel, Hakan Ferhatosmanoglu, Kenneth Rose
2002 Proceedings of the tenth ACM international conference on Multimedia - MULTIMEDIA '02  
In this paper, we introduce a novel indexing technique based on efficient compression of the feature space for approximate similarity searching in large multimedia databases. Its main novelty is that state-of-the-art tools from the discipline of data compression are adopted to optimize the complexityperformance tradeoff in large data sets. The design procedure optimizes the query access time by jointly accounting for both database distribution and query statistics. We achieve efficient
more » ... efficient compression by using appropriate vector quantization (VQ) techniques, namely, multi-stage VQ and split-VQ, which are especially suited for limited memory applications. We partition the data set using the accumulated query history, and each partition of data points is separately compressed using a vector quantizer tailored to its distribution. The employed VQ techniques inherently provide a spectrum of points to choose from on the time/accuracy plane. This property is especially crucial for large multimedia databases where I/O time is a bottleneck, because it offers the flexibility to trade time for better accuracy. Our experiments demonstrate speedups of 20 to 35 over a VAfile technique that has been adapted for approximate nearest neighbor searching.
doi:10.1145/641007.641117 dblp:conf/mm/TuncelFR02 fatcat:e6wpplvf7zgkxio3cptmqt4lmm