9,643 Hits in 6.2 sec

A New Indexing Method for High Dimensional Dataset [chapter]

Jiyuan An, Yi-Ping Phoebe Chen, Qinying Xu, Xiaofang Zhou
2005 Lecture Notes in Computer Science  
Since R-tree type of index structures are known as suffering "curse of dimensionality" problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse  ...  We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method.  ...  Acknowledgments The work reported in this paper was partially supported by the Australian Research Council's Discovery Project Grants DP0344488 and DP0345710.  ... 
doi:10.1007/11408079_35 fatcat:d2klnvozefbklbp5rqbqvvpbbi

Eclipse: Practicability Beyond kNN and Skyline [article]

Jinfei Liu, Li Xiong, Qiuchen Zhang, Jian Pei, Jun Luo
2018 arXiv   pre-print
Furthermore, we propose a novel index-based algorithm utilizing duality transform with much better efficiency.  ...  The experimental results on the real NBA dataset and the synthetic datasets demonstrate the effectiveness and efficiency of our eclipse algorithms.  ...  Index-based Algorithm for High Dimensional Space In this subsection, we show how to build the index structures and process the eclipse query in high dimensional space.  ... 
arXiv:1707.01223v2 fatcat:rnzyzvv5gfbrvbwvpufwgdpipi


Ju Han Kim, Lucila Ohno-Machado, Isaac S. Kohane
2000 Biocomputing 2001  
This paper describes a novel method to organize a complex high-dimensional space into successive lower-dimensional spaces based on the geometric properties of the data structure in the absence of a priori  ...  The matrix incision tree algorithm reveals the hierarchical structural organization of observed data by determining the successive hyperplanes that 'optimally' separate the data hyperspace.  ...  Acknowledgments LOM was funded under grant R29 LM06538-01 from the National Library of Medicine, NIH.  ... 
doi:10.1142/9789814447362_0004 fatcat:lmyf65ngkbfiveqv3jhhmpr4fq

An efficient subspace sampling framework for high-dimensional data reduction, selectivity estimation, and nearest-neighbor search

C.C. Aggarwal
2004 IEEE Transactions on Knowledge and Data Engineering  
The method is naturally able to estimate the local implicit dimensionalities of each point very effectively and, thereby, create a variable dimensionality reduced representation of the data.  ...  One of the challenges of designing effective data reduction techniques is to be able to preserve the ability to use the reduced format directly for a wide range of database and data mining applications  ...  This is because of the high dimensionality of the problem which is outside the reach of normal index structures.  ... 
doi:10.1109/tkde.2004.49 fatcat:gsjbi7vwenghtnddqq3gstjrby

BM + -Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces [chapter]

Xiangmin Zhou, Guoren Wang, Xiaofang Zhou, Ge Yu
2005 Lecture Notes in Computer Science  
In this paper, we propose a novel high-dimensional index method, the BM + -tree, to support efficient processing of similarity search queries in high-dimensional spaces.  ...  The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage  ...  Since retrieving multidimensional data always incurs very high, and sometimes prohibitively high, costs for large datasets, the search for effective index structures to support high dimensional similarity  ... 
doi:10.1007/11408079_36 fatcat:rsebh5afv5ambnan4ojz4hp55u

A Geometric Algorithm for Learning Oblique Decision Trees [chapter]

Naresh Manwani, P. S. Sastry
2009 Lecture Notes in Computer Science  
Motivated by this, our algorithm uses a strategy, based on some recent variants of SVM, to assess the hyperplanes in such a way that the geometric structure in the data is taken into account.  ...  In this paper we present a novel algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess goodness of hyperplanes at each node.  ...  In this paper we present a new decision tree learning algorithm which is based on the idea of capturing the geometric structure.  ... 
doi:10.1007/978-3-642-11164-8_5 fatcat:tjgue56vmnfqljrd7tvdpqioiu

Adaptive Cluster Distance Bounding for High-Dimensional Indexing

Sharadh Ramaswamy, Kenneth Rose
2011 IEEE Transactions on Knowledge and Data Engineering  
We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index.  ...  Clustering, on the other hand, exploits inter-dimensional correlations and is thus a more compact representation of the data-set.  ...  We first note that our index structure is flat or scan-based (and not tree based).  ... 
doi:10.1109/tkde.2010.59 fatcat:2o26uoj4zvd6le2phyeujlm6x4

An Indexing Approach for Representing Multimedia Objects in High-Dimensional Spaces Based on Expectation Maximization Algorithm [chapter]

Giuseppe Boccignone, Vittorio Caggiano, Carmine Cesarano, Vincenzo Moscato, Lucio Sansone
2005 Lecture Notes in Computer Science  
In this manner our tree provides a simple and practical solution to index clustered data and support efficient retrieval of the nearest neighbors in high dimensional object spaces.  ...  In this paper we introduce a new indexing approach to representing multimedia object classes generated by the Expectation Maximization clustering algorithm in a balanced and dynamic tree structure.  ...  The clustered data structure can be efficiently used (e.g., by means of a recursive application on the data space) to build indexes (e.g., search-trees) for high dimensional data sets, which support efficient  ... 
doi:10.1007/11551898_8 fatcat:z3zczsle6jeqbalg6l2lgvlkla

Reverse k-Nearest Neighbor Search Based on Aggregate Point Access Methods [chapter]

Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle, Alexander Katzdobler
2009 Lecture Notes in Computer Science  
Compared to the limitations of existing methods for the RkNN search, our approach works on top of Multi-Resolution Aggregate (MRA) versions of any index structures for multi-dimensional feature spaces  ...  Our solution outperforms the state-of-the-art RkNN algorithms in terms of query execution times because it exploits advanced strategies for pruning index entries.  ...  Our approach is based on an index structure I for point data which is based on the concept of minimal-bounding-rectangles, e.g. the R-tree family including the R-tree [4] , the R * -tree [5] and the  ... 
doi:10.1007/978-3-642-02279-1_32 fatcat:gxgf3jtodfenpdriaaplavz6iy

Semi-supervised Node Splitting for Random Forest Construction

Xiao Liu, Mingli Song, Dacheng Tao, Zicheng Liu, Luming Zhang, Chun Chen, Jiajun Bu
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition  
To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation.  ...  Experimental results on publicly available datasets demonstrate the superiority of our method.  ...  The second family assumes that the high-dimensional data roughly lie on a low-dimensional manifold such that the unlabeled data can be efficiently used to infer the structure of the manifold without being  ... 
doi:10.1109/cvpr.2013.70 dblp:conf/cvpr/LiuSTLZCB13 fatcat:ywawii4ocredpml2cld6pcxtty

Approximate kNN Classification for Biomedical Data [article]

Panagiotis Anagnostou, Petros T. Barmbas, Aristidis G. Vrahatis, Sotiris K. Tasoulis
2020 arXiv   pre-print
In this work, we proposed the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data focusing on a particular methodology tailored for high dimensional  ...  However, the ultra-high dimensionality that characterizes scRNA-seq impose a computational bottleneck, while prediction power can be affected by the "Curse of Dimensionality".  ...  We particularly focus on the MRPT algorithm [18] that has been proven to be the faster approximate method for very high dimensional data.  ... 
arXiv:2012.02149v1 fatcat:2ojbwhmrszg65e7wtof7uqjoru

Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces [article]

Mohamad Dolatshah, Ali Hadian, Behrouz Minaei-Bidgoli
2015 arXiv   pre-print
Ball*-tree enjoys a modified space partitioning algorithm that considers the distribution of the data points in order to find an efficient splitting hyperplane.  ...  Results show that Ball*-tree performs 39%-57% faster than the original Ball-tree algorithm.  ...  While pivot-based approaches are faster in medium-sized datasets, the required number of pivots is extremely large for high-dimensional datasets [34] .  ... 
arXiv:1511.00628v1 fatcat:u27hgbpcvndyzlnqodpxicezra

Reference Point Hyperplane Trees [chapter]

Richard Connor
2016 Lecture Notes in Computer Science  
In both cases the tree with the greater mean data depth performs better in high-dimensional spaces.  ...  The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure.  ...  Interestingly however the monotonous tree performs substantially better that an equivalent hyperplane tree in high dimensional spaces.  ... 
doi:10.1007/978-3-319-46759-7_5 fatcat:gjcf72ounrbtrfoemkntgukgvm


H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang
2005 ACM Transactions on Database Systems  
In this article, we present an efficient B + -tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on  ...  This allows the points to be indexed using a B + -tree structure and KNN search to be performed using one-dimensional range search.  ...  However, the algorithms in Arya et al. [1994 Arya et al. [ , 1998 ] are based on a main memory indexing structure called bd-tree, while the problem we are considering is when the data and indexes are  ... 
doi:10.1145/1071610.1071612 fatcat:htasun7ycje43gbgyqgrise6ua

A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data [chapter]

Charu C. Aggarwal
2006 Proceedings of the 2006 SIAM International Conference on Data Mining  
Existing techniques which try to perform dimensionality reduction are too slow for practical use in the high dimensional case. These techniques try to find global discriminants in the data.  ...  High dimensional data presents a challenge to the classification problem because of the difficulty in modeling the precise relationship between the large number of feature variables and the class variable  ...  Therefore, a hierarchical traversal is performed on the tree structure using the same rules as utilized by the tree construction algorithm in defining direct assignments.  ... 
doi:10.1137/1.9781611972764.32 dblp:conf/sdm/Aggarwal06a fatcat:aw4qsw3p6bgkhnanc6cxgvfunm
« Previous Showing results 1 — 15 out of 9,643 results