Exploiting Geometry for Support Vector Machine Indexing [chapter]

Navneet Panda, Edward Y. Chang
2005 Proceedings of the 2005 SIAM International Conference on Data Mining  
Support Vector Machines (SVMs) have been adopted by many data-mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the dataset is large, naively scanning the entire dataset to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and thus improve the performance of top-k queries. Our kernel indexer (KDX) takes
more » ... ntage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., γ and σ) without performance compromise. Through theoretical analysis, and empirical studies on a wide variety of datasets, we demonstrate KDX to be very effective.
doi:10.1137/1.9781611972757.29 dblp:conf/sdm/PandaC05 fatcat:yyi7fbt6sjdadge45p2zxk72ki