Filters








921,003 Hits in 6.9 sec

Understanding of Internal Clustering Validation Measures

Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu
2010 2010 IEEE International Conference on Data Mining  
In this paper, we focus on internal clustering validation and present a detailed study of 11 widely used internal clustering validation measures for crisp clustering.  ...  Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications.  ...  There are some other internal validation measures in literature [17] [18] [19] [20] . However, some have poor performance while some are designed for data sets with specific structures.  ... 
doi:10.1109/icdm.2010.35 dblp:conf/icdm/LiuLXGW10 fatcat:4c73mmlmrrbc5jlqpt2ut5bpxu

An Approach for Assessing Clustering of Households by Electricity Usage [article]

Ian Dent, Tony Craig, Uwe Aickelin, Tom Rodden
2014 arXiv   pre-print
The approach is tested using data from 180 UK households monitored for over a year at a sampling interval of 5 minutes. Data is taken from the evening peak electricity usage period of 4pm to 8pm.  ...  To evaluate the effectiveness of the variability measures, a number of cluster validity indexes are explored with regard to how the indexes vary with the number of clusters, the number of attributes, and  ...  The work is part of a wider project to successfully apply demand side management techniques to gain benefits across the whole electricity network [14] .  ... 
arXiv:1409.0718v1 fatcat:woc3q4tf4bbetmaqhjy3nupizy

An Approach for Assessing Clustering of Households by Electricity Usage

Ian Dent, Tony Craig, Uwe Aickelin, Tom Rodden
2012 Social Science Research Network  
The approach is tested using data from 180 UK households monitored for over a year at a sampling interval of 5 minutes. Data is taken from the evening peak electricity usage period of 4pm to 8pm.  ...  To evaluate the effectiveness of the variability measures, a number of cluster validity indexes are explored with regard to how the indexes vary with the number of clusters, the number of attributes, and  ...  The work is part of a wider project to successfully apply demand side management techniques to gain benefits across the whole electricity network [15] .  ... 
doi:10.2139/ssrn.2828465 fatcat:w3ylhfhjczgh7ivyo2xmagti7a

A new fuzzy clustering algorithm for optimally finding granular prototypes

Ying Xie, Vijay V. Raghavan, Praveen Dhatric, Xiaoquan Zhao
2005 International Journal of Approximate Reasoning  
Experiments show that, when used in conjunction with the new cluster validity measure, 3M algorithm produces better results on the experimental data sets than several alternatives.  ...  In order to find optimal granular prototypes through fuzzy clustering, for given data, two conditions are necessary: a good cluster validity function, which can be applied to evaluate the goodness of cluster  ...  Some results of the five algorithms are shown in Fig. 1 ; for 3M algorithm, the validity values calculated by Eq. (3) for the number of clusters in the range [2, 10] are plotted in Fig. 2 ; and detailed  ... 
doi:10.1016/j.ijar.2004.11.002 fatcat:sardqshhsfgdbfm25hn2xnayli

A Family of Two-Dimensional Benchmark Data Sets and Its Application to Comparing Different Cluster Validation Indices [chapter]

Jorge M. Santos, Mark Embrechts
2014 Lecture Notes in Computer Science  
It is shown that even for simple 2-D data sets there is a large discrepancy on the ideal number of clusters suggested by traditional cluster validation indices.  ...  There are two main objectives in this paper: the first one is to introduce a collection of two-dimensional benchmark data sets with a wide variety of clustering characteristics that are typical for real-world  ...  In these experiments the suggested number of clusters for both clustering algorithms was determined for each of the four cluster validation indices.  ... 
doi:10.1007/978-3-319-07491-7_5 fatcat:rhx2vfckxndchf4kd3ocl6varu

A New Clustering Algorithm Based on Near Neighbor Influence [article]

Xinquan Chen
2014 arXiv   pre-print
By simulated experiments of some artificial data sets and seven real data sets, we observe that this algorithm can often get good clustering quality when making proper value of some parameters.  ...  This paper presents Clustering based on Near Neighbor Influence (CNNI), a new clustering algorithm which is inspired by the idea of near neighbor and the superposition principle of influence.  ...  To verify the validity and time efficiency of this algorithm, there will be some experiments of some artificial data sets, two UCI data sets and two bmp pictures in the next subsections.  ... 
arXiv:1409.6848v1 fatcat:nx3p3267abaf5cj7btvjvfneoa

Genetic algorithm based two-mode clustering of metabolomics data

J. A. Hageman, R. A. van den Berg, J. A. Westerhuis, M. J. van der Werf, A. K. Smilde
2008 Metabolomics  
Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes.  ...  Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized.  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided  ... 
doi:10.1007/s11306-008-0105-7 fatcat:i6oadxgaffccfjygnttlgcm47u

METTLE: A METamorphic Testing Approach to Assessing and Validating Unsupervised Machine Learning Systems

Xiaoyuan Xie, Zhiyi Zhang, Tsong Yueh Chen, Yang Liu, Pak-Lok Poon, Baowen Xu
2020 IEEE Transactions on Reliability  
Such assessments and validation tasks, however, are fairly challenging due to the absence of a priori knowledge of the data.  ...  Since unsupervised machine learning systems are widely used in many real-world applications, assessing the appropriateness of these systems and validating their implementations with respect to individual  ...  THREATS TO VALIDITY In this section, we discuss some potential factors that might affect the validity of our experiment and user evaluation study. A.  ... 
doi:10.1109/tr.2020.2972266 fatcat:ihjx5adphjdalkwyjyln7folby

An ensemble clustering model for mining concept drifting stream data in emergency management

Yong Zhang, Yi Peng, Jun Li, Gang Kou, Yong Shi
2012 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop on - DM-IKM '12  
According to the experiment, the results demonstrate the effect and performance of the proposed model in mining data streams with concept drifts.  ...  Aim to resolve this issue, in this paper; we propose an ensemble clustering model for mining concept drifting stream data in emergency management.  ...  Table 1 is the detailed introduction of some classical external validity index. This study chooses seven external measures for the experiment, which are defined as follows.  ... 
doi:10.1145/2462130.2462132 fatcat:o7jzkwi25zckrnf566omo2cqqm

Benefit-based consumer segmentation and performance evaluation of clustering approaches: An evidence of data-driven decision-making

Deepak Arunachalam, Niraj Kumar
2018 Expert systems with applications  
Abstract This study evaluates the performance of different data clustering approaches for searching the profitable consumer segments in the UK hospitality industry.  ...  This study makes a significant contribution to literature by comparing different clustering approaches and addressing misconceptions of using these for market segmentation to support data-driven decision  ...  In this study, the validity of clusters is measured using above mentioned validity indices.  ... 
doi:10.1016/j.eswa.2018.03.007 fatcat:mtkeugp5kbbwbdhd7jxezklw3y

Performance analysis of Data Mining algorithms in Weka

Mahendra Tiwari
2012 IOSR Journal of Computer Engineering  
This paper elaborates the use of data mining technique to help retailers to identify customer profile for a retail store and behaviors, improve better customer satisfaction and retention.  ...  The retail industry collects vast amounts of data on sales, customer buying history, goods, and service with ease of use of modern computing technology.  ...  cache.  S/W tool: In all the experiments, We used Weka 3-6-6 we looked at different characteristics of the applicationsusing classifiers to measure the accuracy in different data sets, using clusterer  ... 
doi:10.9790/0661-0633241 fatcat:cqxbupluonefbpzk5fqzq7srh4

Unsupervised speaker recognition based on competition between self-organizing maps

I. Lapidot, H. Guterman, A. Cohen
2002 IEEE Transactions on Neural Networks  
Based on the iterative clustering algorithm a validity criterion was also developed to estimate the number of speakers.  ...  We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given.  ...  In this section, we present the results of the experiments. Some of the results of these experiments have already been published [17] , [18] .  ... 
doi:10.1109/tnn.2002.1021888 pmid:18244483 fatcat:jxk5snp3fnaqfge2nrn4wevyfi

A hybrid unsupervised and supervised clustering applied to microarray data

Raul Malutan, Pedro Gomez Vilda, Monica Borda
2013 International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems  
It proposes a method for hybrid bi-clustering of microarray data combined with a supervised validation for determining the optimal amount of clusters of genes.  ...  This work presents a state-of-the-art in unsupervised clustering and cluster validation.  ...  For cluster validation we used both internal and external methods by computing some indexes which gave us the optimal number of clusters for each clustering method.  ... 
doi:10.11601/ijates.v2i3.21 fatcat:uvuizkmb4nfm7k62v3cmaif34m

Cluster Tendency Assessment for Fuzzy Clustering of Incomplete Data

Ludmila Himmelspach, Daniel Hommers, Stefan Conrad
2011 Proceedings of the 7th conference of the European Society for Fuzzy Logic and Technology (EUSFLAT-2011)  
On the other hand we analyse in experiments on several data sets to what extent the clustering results produced by fuzzy clustering methods for incomplete data reflect the distribution structure of data  ...  In this study, we analyse different cluster validity functions in terms of applicability on incomplete data on the one hand.  ...  Data experiments conducted in [2, 5, 6] have shown that for an optimal number of clusters some of these methods are able to assign data items to clusters quite accurately.  ... 
doi:10.2991/eusflat.2011.136 dblp:conf/eusflat/HimmelspachHC11 fatcat:o3525kusnne43fmvjmtfyf3bj4

Relational visual cluster validity (RVCV)

Yunfei Ding, Robert F. Harrison
2007 Pattern Recognition Letters  
The assessment of cluster validity plays a very important role in clustering analysis.  ...  There are very few validity methods which can be used to analyze the clustering validity of relational data.  ...  The initial number of clusters c of the first experiment using six points is equal to 2, 3 and 4 (for c = 2 and c = 3, each cluster has an equal number of samples; for c = 4, two of the four clusters have  ... 
doi:10.1016/j.patrec.2007.06.002 fatcat:pcyfvbuckfbynfnkumjdbj4kbm
« Previous Showing results 1 — 15 out of 921,003 results