29,262 Hits in 5.9 sec

Clustering Stable Instances of Euclidean k-means [article]

Abhratanu Dutta, Aravindan Vijayaraghavan, Alex Wang
2017 arXiv   pre-print
Stable instances have unique optimal k-means solutions that do not change even when each point is perturbed a little (in Euclidean distance).  ...  The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning.  ...  Practically interesting instances of the k-means clustering problem often have a clear optimal clustering solution (usually the ground-truth clustering) that is stable: i.e., it remains optimal even under  ... 
arXiv:1712.01241v1 fatcat:ouia3st3i5hfnd6klgzxeudnh4

Clustering under Perturbation Stability in Near-Linear Time [article]

Pankaj K. Agarwal and Hsien-Chih Chang and Kamesh Munagala and Erin Taylor and Emo Welzl
2020 arXiv   pre-print
For k-center and k-means problems, our algorithms also achieve polynomial dependence on the number of clusters, k, when α≥ 2 + √(3) + ϵ for any constant ϵ > 0 in any fixed dimension.  ...  An instance is α-stable if the underlying optimal clustering continues to remain optimal even when all pairwise distances are arbitrarily perturbed by a factor of at most α.  ...  If the k-means, k-median, or k-center instance for X under the Euclidean distance is α-stable for α ≥ 2 + 3 then the optimal clustering can be computed inÕ(nk 2 + k 2d−1 · (k!) k ) time.  ... 
arXiv:2009.14358v1 fatcat:7xjoqmce4ra2plk3xb6xda7kee

On the Local Structure of Stable Clustering Instances [article]

Vincent Cohen-Addad, Chris Schwiegelshohn
2017 arXiv   pre-print
We study the classic k-median and k-means clustering objectives in the beyond-worst-case scenario.  ...  As a corollary we obtain that the widely-used Local Search algorithm has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small  ...  Let (A, R d , || · || 2 , k) be an instance of Euclidean k-means clustering with optimal clustering C = {C * 1 , . . . C * k } and centers S = {c * 1 , . . . c * k }.  ... 
arXiv:1701.08423v3 fatcat:dbpdvi6kunaszixz45kvxbqbty

An exact algorithm for stable instances of the $ k $-means problem with penalties in fixed-dimensional Euclidean space

Fan Yuan, Dachuan Xu, Donglei Du, Min Li
2021 Journal of Industrial and Management Optimization  
<p style='text-indent:20px;'>We study stable instances of the <inline-formula><tex-math id="M2">\begin{document}$ k $\end{document}</tex-math></inline-formula>-means problem with penalties in fixed-dimensional  ...  instance of the <inline-formula><tex-math id="M7">\begin{document}$ k $\end{document}</tex-math></inline-formula>-means problem with penalties in fixed-dimensional Euclidean space can be solved accurately  ...  [14] study the stable instances of k-means problem in fixed-dimensional Euclidean space and prove that for any fixed > 0, for a (1 + )-stable instance of the k-means problem, the optimal solution can  ... 
doi:10.3934/jimo.2021122 fatcat:ugtdhu6kxjf25akspgf44azh7i

Computational Feasibility of Clustering under Clusterability Assumptions [article]

Shai Ben-David
2015 arXiv   pre-print
The hope is that there will be clustering algorithms that are provably efficient on such 'clusterable' instances.  ...  It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out.  ...  An obvious, though of relatively minor significance question is: For which values of α does the problem of optimizing the k-means or k-median objectives for α-center stable instances become NP hard for  ... 
arXiv:1501.00437v1 fatcat:ch3grn4l7jdd7najcivdofgf5e

Stability Yields a PTAS for k-Median and k-Means Clustering

Pranjal Awasthi, Avrim Blum, Or Sheffet
2010 2010 IEEE 51st Annual Symposium on Foundations of Computer Science  
We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant).  ...  From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for k-median  ...  Any (1+α)-weakly deletion-stable instance is α k-means instance is α 4 -distributed.  ... 
doi:10.1109/focs.2010.36 dblp:conf/focs/AwasthiBS10 fatcat:we5wkl2ufjhu3llt5h7yz2pqqa

On Euclidean k-Means Clustering with alpha-Center Proximity

Amit Deshpande, Anand Louis, Apoorv Vikram Singh
2019 International Conference on Artificial Intelligence and Statistics  
k-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal k-means clusters are stable under additive or multiplicative perturbation of data.  ...  We study the problem of minimizing the Euclidean k-means objective only over clusterings that satisfy α-center proximity.  ...  Euclidean k-means clustering, where the size of each cluster is at least ωn/k, that satisfies the following properties: (i) if the Vertex-Cover instance has value k, the optimal α-center proximal k-means  ... 
dblp:conf/aistats/DeshpandeLS19 fatcat:7w3h7yslmvgkbe5kimjd7cs5vu

Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data

G. Valentini
2005 Bioinformatics  
We present a new R package for the assessment of the reliability of clusters discovered in high-dimensional DNA microarray data.  ...  . hierarchical, k-means, fuzzy k-means or prediction around medoids).  ...  For instance the function Random.fuzzy.kmeans. validity applies the fuzzy k-means clustering algorithm to the data, computes the similarity matrix using multiple random subspace projections and then computes  ... 
doi:10.1093/bioinformatics/bti817 pmid:16332708 fatcat:qmizohuhqbh5xfvwt5rqxotby4

Individual Preference Stability for Clustering [article]

Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, Ali Vakilian
2022 arXiv   pre-print
As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces.  ...  We evaluate some of our algorithms and several standard clustering approaches on real data sets.  ...  JM is supported by fundings from the NSF AI Institute for the Foundations of Machine Learning (IFML), an NSF Career award, and the Simons Collaborative grant on Theory of Algorithmic Fairness.  ... 
arXiv:2207.03600v1 fatcat:iqqfgusxwnhajc5u5nscvdyymq

Clustering as Data Mining Technique in Risk Factors Analysis of Diabetes, Hypertension and Obesity

Mohammed Gulam Ahamad, Mohammed Faisal Ahmed, Mohammed Yousuf Uddin
2018 European Journal of Engineering and Technology Research  
The simple k-means cluster techniques are adopted to form ten clusters which are clearly discernible to distinguish the differences among the risk factors such as diabetes, obesity and hypertension.  ...  The cluster analysis technique is utilized to study the effects of diabetes, obesity and hypertension from the database obtained from Virginia school of Medicine.  ...  Initial cluster seed value is set. This is a temporary means of clusters. 2. The squared Euclidean distance from each instance is computed, and accordingly assigned to nearest cluster. 3.  ... 
doi:10.24018/ejeng.2016.1.6.202 fatcat:ifkaecvjsfhu5jrf763h5fb5mm

Error Evaluation on K- Means and Hierarchical Clustering with Effect of Distance Functions for Iris Dataset

Harish KumarSagar, Varsha Sharma
2014 International Journal of Computer Applications  
In Data clustering (a sub field of Data mining), k-means and hierarchical based clustering algorithms are popular due to its excellent performance in clustering of large data sets.  ...  The foremost objective of this paper is to divide the data objects into k number of different clusters with homogeneity and the each cluster should be heterogeneous to each other.  ...  By then each cluster is stable and no switch of data point arises Distance Functions In K-Means Euclidean distance function In mathematics, the Euclidean distance or Euclidean metric is the "general  ... 
doi:10.5120/15066-3429 fatcat:folu35hppfgmzbdxgbhdw4bpn4

Time-series clustering of cage-level sea lice data

Ana Rita Marques, Henny Forde, Crawford W. Revie, Thomas P. Adams
2018 PLoS ONE  
A series of strategies involving a combination of distance measures and prototypes were explored and cluster evaluation was performed using cluster validity indices.  ...  Repeated agreement on cluster membership for different combinations of distance and centroids was taken to be a strong indicator of clustering while the stability of these results reinforced this likelihood  ...  We also thank William Chalmers for editorial assistance in preparation of the manuscript.  ... 
doi:10.1371/journal.pone.0204319 fatcat:ipwf6uasubhsxikc4hwb2kf2gq

A set theory based similarity measure for text clustering and classification

Ali A. Amer, Hassan I. Abdalla
2020 Journal of Big Data  
Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined  ...  This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification.  ...  He also served there as a head of Quality Unit. Dr.  ... 
doi:10.1186/s40537-020-00344-3 fatcat:nnaqfhewsjgpvmjx3shpytms2m

SACOC:A Spectral-Based ACO Clustering Algorithm [chapter]

Héctor D. Menéndez, Fernando E. B. Otero, David Camacho
2015 Studies in Computational Intelligence  
The new algorithm, called SACOC, has been compared against well-known algorithms (K-means and Spectral Clustering) and with ACOC.  ...  At the same time, new clustering techniques that seek the continuity of data, specially focused on spectral-based approaches in opposition to classical centroid-based approaches, have attracted an increasing  ...  K-means and ACOC are not able to define the continuity of the data due to the use of the Euclidean space.  ... 
doi:10.1007/978-3-319-10422-5_20 fatcat:n5zgfmgkbfbrzasdstperx7l6m

Laplacian Eigenmaps Dimensionality Reduction Based on Clustering-Adjusted Similarity

Honghu Zhou, Jun Wang
2019 Algorithms  
Euclidean distance between instances is widely used to capture the manifold structure of data and for graph-based dimensionality reduction.  ...  LE-CAS first performs clustering on all instances to explore the global structure and discrimination of instances, and quantifies the similarity between cluster centers.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/a12100210 fatcat:s4lq7jrwz5ardfajqiidhjocem
« Previous Showing results 1 — 15 out of 29,262 results