323,199 Hits in 4.4 sec

A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Yongjun Piao, Minghao Piao, Cheng Hao Jin, Ho Sun Shon, Ji-Moon Chung, Buhyun Hwang, Keun Ho Ryu
2015 Mathematical Problems in Engineering  
In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features  ...  However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets.  ...  feature partitioning-based ensemble method to better classify high-dimensional data.  ... 
doi:10.1155/2015/590678 fatcat:ivwnojz3vzg35bxw2wkvhiq3oa

A Recursive Partitioning Method for Nearest Neighbor Search in High Dimensional Data

Raghunadh Pasunuri, Sobha Rani
Number   unpublished
In this work we propose a recursive partitioning and distance-based indexing scheme for large and high-dimensional data to retrieve the nearest neighbours for a given query.  ...  In the next level for each sub-partition a reference point is selected and again it is partitioned into further sub-sub-partitions. Main advantage of this method is that it reduces the search space.  ...  Durga Bhavani for their valuable comments and suggestions.  ... 

An Efficient Unsavory Data Detection Method for Internet Big Data [chapter]

Peige Ren, Xiaofeng Wang, Hao Sun, Fen Xu, Baokang Zhao, Chunqing Wu
2015 Lecture Notes in Computer Science  
For a high-dimensional data object v in pyramid j of subspace i, we compute the height h v (to its top) and map v into a one-dimensional value p v =i+j+(0.5-h v ).  ...  To realize intelligent and efficient unsavory data detection for internet big data, we proposed the i-Tree method, a semantics-based data detection method.  ... 
doi:10.1007/978-3-319-24315-3_21 fatcat:74drc44vgzfz7ec5jaqyiosx2m

Concentric hyperspaces and disk allocation for fast parallel range searching

H. Ferhatosmanoglu, D. Agrawal, A. El Abbadi
1999 Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)  
However, most of these techniques were primarily designed for two-dimensional data and for balanced partitioning of the data space.  ...  In this paper, we first establish that traditional declustering techniques do not scale for high-dimensional data. We then propose several new partitioning schemes based on concentric hyperspaces.  ...  High Dimensional Data and Balanced Partitioning Balanced partitioning is a common assumption for most of the declustering techniques. The data space is divided into AE parts in Ø dimension.  ... 
doi:10.1109/icde.1999.754977 dblp:conf/icde/FerhatosmanogluAA99 fatcat:psux4hfrsveujoerahbiudiuqu

A New Indexing Method for High Dimensional Dataset [chapter]

Jiyuan An, Yi-Ping Phoebe Chen, Qinying Xu, Xiaofang Zhou
2005 Lecture Notes in Computer Science  
However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high.  ...  We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method.  ...  The fan out of a node becomes very small due to the large size of coordi-nates for high dimensional data.  ... 
doi:10.1007/11408079_35 fatcat:d2klnvozefbklbp5rqbqvvpbbi

High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning [chapter]

Sachin Kulkarni, Ratko Orlandic
2006 Lecture Notes in Computer Science  
A new space partitioning method is proposed along with a new algorithm for exact similarity search in high-dimensional spaces.  ...  It relies on a new method for data-sensitive space partitioning based on explicit data clustering, which is introduced in the paper for the first time.  ...  An appropriate similarity search method must be aware of the locality of data in high dimensions. However, most methods for finding the locality of data rely on dimensionality reduction.  ... 
doi:10.1007/11827405_72 fatcat:2apjccuijfc4rnnf4tcpgbh55e

Indexing Issues in Supporting Similarity Searching [chapter]

Hanan Samet
2004 Lecture Notes in Computer Science  
This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distance-based indexing, dimension reduction, and embedding methods.  ...  Concluding Remarks Providing indexing support for similarity searching is an important area where much work remains to be done.  ...  Dimension Reduction and Embedding Methods There are many problems with indexing high-dimensional data.  ... 
doi:10.1007/978-3-540-30542-2_57 fatcat:7nysehgscvet5asn7oho22towy

A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing [chapter]

Michael A. Schuh, Tim Wylie, Juan M. Banda, Rafal A. Angryk
2013 Lecture Notes in Computer Science  
and high-dimensional data and highlight the inherent difficulties associated with such tasks.  ...  In this work, we perform the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance.  ...  A special thanks to all research and manuscript reviewers.  ... 
doi:10.1007/978-3-642-39467-6_22 fatcat:glvzalrln5hj3o2cxlyzm2c7py

A Class of Region-preserving Space Transformations for Indexing High-dimensional Data

Ratko Orlandic, Jack Lukaszuk
2005 Journal of Computer Science  
This study introduces a class of region preserving space transformation (RPST) schemes for accessing high-dimensional data.  ...  The techniques are experimentally compared to the Pyramid Technique, which is another example of static partitioning designed for high-dimensional data.  ...  As a result, access methods for high-dimensional data [2] [3] [4] [5] [6] [7] [8] [9] continue to attract considerable scientific interest.  ... 
doi:10.3844/jcssp.2005.89.97 fatcat:msr4o6ayizdoheoezhsiioaeg4

A Functional Measure-Based Framework for Evaluation of Multi-Dimensional Point Access Methods

Mohammadreza keyvnpour, Najva izadpanah
2011 Procedia Environmental Sciences  
Multi-dimensional access methods have developed for supporting fast retrieval of multi-dimensional data from multi-dimensional databases.  ...  In this framework, in order to present a comprehensive evaluation of multi-dimensional point access methods, firstly, we extended related classification of multi-dimensional point access methods in the  ...  For example, BSP-tree is a binary tree that represents a recursive complete partitioning of the data space into subspaces.  ... 
doi:10.1016/j.proenv.2011.09.127 fatcat:3bdiiy2llze4josfw4wrcbrjry

BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances [article]

Yang Song, Yu Gu, Rui Zhang, Ge Yu
2020 arXiv   pre-print
Such high-dimensional space has posed significant challenges for existing kNN search algorithms with Bregman distances, which could only handle data of medium dimensionality (typically less than 100).  ...  This paper addresses the urgent problem of high-dimensional kNN search with Bregman distances. We propose a novel partition-filter-refinement framework.  ...  high-dimensional data points from the disks.  ... 
arXiv:2006.00227v1 fatcat:cvuizn6xbjebze2q77sp2x2vce

Clustering based feature selection using Partitioning Around Medoids (PAM)

Dewi Pramudi Ismi, Murinto Murinto
2020 Jurnal Informatika  
AB S T R A C T High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time.  ...  There are two methods employed for dimensionality reduction purposes: feature selection and feature extraction [6] .  ...  High dimensional data give much chance to overfitting problem. Small data usually leads to a simpler model, and a simpler model tends to generalize better. d.  ... 
doi:10.26555/jifo.v14i2.a17620 fatcat:dxhhhwvlbrh7ji7ea4an4eqrle

An empirical study on the visual cluster validation method with Fastmap

Z. Huang, D.W. Cheung, M.K. Ng
2001 Proceedings Seventh International Conference on Database Systems for Advanced Applications DASFAA 2001 DASFAA-01  
from data partitions.  ...  The visual cluster validation method attempts to tackle two clustering problems in data mining: ( I ) to verify partitions of data created by a clustering algorithm and ( 2 ) to identify genuine clusters  ...  Projection of high dimensional data onto low dimensional spaces for clustering is a common approach in cluster analysis. Fastmap was primarily designed for this purpose [71.  ... 
doi:10.1109/dasfaa.2001.916368 dblp:conf/dasfaa/HuangNC01 fatcat:oti4l7yidvbdfa6zjoocq4yr3q

An Efficient Semantic-Based Organization and Similarity Search Method for Internet Data Resources [chapter]

Peige Ren, Xiaofeng Wang, Hao Sun, Baokang Zhao, Chunqing Wu
2014 Lecture Notes in Computer Science  
First, the iHash normalizes the internet data objects into a high-dimensional feature space, solving the "feature explosion" problem of the feature space; second, we partition the high-dimensional data  ...  In this paper, we present the iHash method, a semantic-based organization and similarity search method for internet data resources.  ...  Jagadish et al. presented the iDistance method for k-nearest neighbor (kNN) query in a high-dimensional metric space.  ... 
doi:10.1007/978-3-642-55032-4_68 fatcat:oocsmwuqazfavbjhag3eyvd7cu

A Comprehensive Study of Challenges and Approaches for Clustering High Dimensional Data

Neelam Singh, Neha Garg, Janmejay Pant
2014 International Journal of Computer Applications  
In this paper we provide a short introduction to various approaches and challenges for high-dimensional data clustering.  ...  Most clustering methods work efficiently for low dimensional data since distance measures are used to find dissimilarities between objects.  ...  But often the data collected for research contains multiple dimension, is sparse and highly skewed, known as high dimensional data.  ... 
doi:10.5120/15995-4844 fatcat:y3g3hhvttfduxikr35uf7ecuky
« Previous Showing results 1 — 15 out of 323,199 results