Filters








13,797 Hits in 6.8 sec

Efficient mixture model for clustering of sparse high dimensional binary data [article]

Marek Śmieja, Krzysztof Hajto, Jacek Tabor
2017 arXiv   pre-print
In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering.  ...  In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the processing of sparse data, -can be efficiently realized by an on-line Hartigan optimization algorithm  ...  In this paper we introduce a version of model-based clustering, SparseMix, which efficiently processes high-dimensional sparse binary data 1 .  ... 
arXiv:1707.03157v1 fatcat:a5og2ifvzbg6bby7r4nqacjf4q

Efficient mixture model for clustering of sparse high dimensional binary data

Marek Śmieja, Krzysztof Hajto, Jacek Tabor
2019 Data mining and knowledge discovery  
In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering.  ...  While most of the clustering methods are designed for continuous data, sparse high-dimensional binary representations became very popular in various domains such as text mining or cheminformatics.  ...  realized for sparse high-dimensional binary data.  ... 
doi:10.1007/s10618-019-00635-1 fatcat:w27lnkvxsrccjalpghgi4figia

A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data

Jinfeng Yi, Lijun Zhang, Jun Wang, Rong Jin, Anil K. Jain
2014 International Conference on Machine Learning  
Learning a statistical model for high-dimensional data is an important topic in machine learning.  ...  In this work, we focus on the problem of clustering high-dimensional data with sparse centers.  ...  Acknowledgement: This work was supported in part by the National Science Foundation (IIS-1251031) and the Office of Naval Research (N00014-11-1-0100 and N00014-12-1-0431).  ... 
dblp:conf/icml/Yi0WJJ14 fatcat:g33asfwhkzgrpcowfygfi4vm3a

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures [article]

Martin Azizyan and Aarti Singh and Larry Wasserman
2014 arXiv   pre-print
The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA).  ...  We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions.  ...  Motivated by this example, we consider a simple non-spherical Gaussian mixture model (defined formally in the next section) for clustering high-dimensional data, and aim to provide a computationally efficient  ... 
arXiv:1406.2206v1 fatcat:fuhe65ojezbqvbr7dwe7zesdyq

Clustering Plotted Data by Image Segmentation [article]

Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou
2021 arXiv   pre-print
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data: by training neural networks to perform instance segmentation on plotted  ...  Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.  ...  Training the Binary Segmentation Model To train the U-Net model for binary segmentation, we generated 1,000 synthetic datasets of blob-shaped clusters.  ... 
arXiv:2110.05187v1 fatcat:ytz3xj4a5fccnkiw3nwpvrkjla

Mining Projected Clusters in High-Dimensional Spaces

M. Bouguessa, Shengrui Wang
2009 IEEE Transactions on Knowledge and Data Engineering  
Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points.  ...  Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space.  ...  For this purpose, we used the class labels as ground truth and measured the accuracy of clustering by matching the points in input and output clusters.  ... 
doi:10.1109/tkde.2008.162 fatcat:6osss5zq4fbsrfms6jcn64adqi

Implicit Sparse Code Hashing [article]

Tsung-Yu Lin, Tsung-Wei Ke, Tyng-Luh Liu
2015 arXiv   pre-print
We address the problem of converting large-scale high-dimensional image data into binary codes so that approximate nearest-neighbor search over them can be efficiently performed.  ...  While the proposed formulation does not require computing any sparse codes, the underlying computation model still inevitably involves solving an unmanageable eigenproblem when extremely high-dimensional  ...  [32] propose a Sparse Projection-based (SP) binary coding scheme for high-dimensional data. They consider a sparsity regularizer to achieve efficiency in both storage and encoding.  ... 
arXiv:1512.00130v1 fatcat:3i3wqbb225dqdjl4iugywjpika

Model-based clustering of multivariate binary data with dimension reduction [article]

Michio Yamamoto, Kenichi Hayashi
2014 arXiv   pre-print
This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure.  ...  The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component is assumed to have a low-dimensional representation of the cluster structure.  ...  The 1st Component The 2nd Component Estimated Cluster  ... 
arXiv:1406.3704v1 fatcat:cipvosczznavpkwcxf5znzbfwy

K-clustered tensor approximation

Yu-Ting Tsai, Zen-Chung Shih
2012 ACM Transactions on Graphics  
K-CTA not only extends previous work on Clustered Tensor Approximation (CTA) to exploit inter-cluster coherence, but also allows a compact and sparse representation for high-dimensional datasets with just  ...  With the increasing demands for photo-realistic image synthesis in real time, we propose a sparse multilinear model, which is named K-Clustered Tensor Approximation (K-CTA), to efficiently analyze and  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their profound comments and suggestions.  ... 
doi:10.1145/2167076.2167077 fatcat:2f43z2mngvfsndnakwwyzcamde

A Hierarchical Framework Using Approximated Local Outlier Factor for Efficient Anomaly Detection

Lin Xu, Yi-Ren Yeh, Yuh-Jye Lee, Jing Li
2013 Procedia Computer Science  
Experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency.  ...  We aim to detect anomalies by the accurate model and the approximated model learned at the remote server and sink nodes, respectively.  ...  LSH algorithm commits to reduce the dimensionality for high dimension data.  ... 
doi:10.1016/j.procs.2013.06.168 fatcat:2hylban5jnb5fhuw3fofcav4im

Introduction to the Issue on Robust Subspace Learning and Tracking: Theory, Algorithms, and Applications

T. Bouwmans, N. Vaswani, P. Rodriguez, R. Vidal, Z. Lin
2018 IEEE Journal on Selected Topics in Signal Processing  
To handle sparse noise in high dimensional data, Dong et al. design a robust tensor approximation (RTA) framework with Laplacian Scale Mixture (LSM) modeling for multi-dimensional data computationally  ...  For abrupt changes in the data, Jiao et al. design a subspace change-point detection where a stream of high-dimensional data points lie on a low- dimensional subspace.  ... 
doi:10.1109/jstsp.2018.2879245 fatcat:z3ohqdl37nat3pjo65fzsf2ady

Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization

Michio Yamamoto, Kenichi Hayashi
2015 Pattern Recognition  
The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component is assumed to have a low-dimensional representation of the cluster structure.  ...  This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure.  ...  Acknowledgment We thank the Editor and two anonymous reviewers for their constructive comments that helped to improve the quality of this article.  ... 
doi:10.1016/j.patcog.2015.05.026 fatcat:nqnqan25brcclmu62ilurrn4ei

Finding Uninformative Features in Binary Data [chapter]

Xin Wang, Ata Kabán
2005 Lecture Notes in Computer Science  
In this paper we propose and study a relatively simple cluster-based generative model for multivariate binary data, equipped with automated feature weighting capability.  ...  For statistical modelling of multivariate binary data, such as text documents, datum instances are typically represented as vectors over a global vocabulary of attributes.  ...  A lot of research has been devoted to dimensionality reduction, feature selection and feature weighting techniques for high-dimensional data, such as text [3, 2, 4] .  ... 
doi:10.1007/11508069_6 fatcat:644zsjnbnraqlfstjqgtwxih6q

Efficient discovery of error-tolerant frequent itemsets in high dimensions

Cheng Yang, Usama Fayyad, Paul S. Bradley
2001 Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '01  
The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data.  ...  We motivate the problem and present an efficient algorithm that identifies errortolerant frequent clusters of items in transactional data (customerpurchase data, web browsing data, text, etc.).  ...  ACKNOWLEDGEMENTS We gratefully acknowledge Jeong Han Kim and Dimitris Achlioptas for discussions and assistance regarding the probability of error-tolerant itemsets occurring by chance.  ... 
doi:10.1145/502512.502539 fatcat:5lnfc7ngsnhadc3kf3bgtolga4

Models for association rules based on clustering and correlation

Carlos Ordonez
2009 Intelligent Data Analysis  
We show the sufficient statistics for clustering and correlation on binary data sets are the linear sum of points and the quadratic sum of points, respectively.  ...  Support bounds and support estimation obey the set downward closure property for fast bottom-up search for frequent itemsets. Both models can be efficiently computed with sparse matrix computations.  ...  For sparse binary data sets, such as transaction data sets, high dimensionality has a marginal impact on speed due to the efficient sparse matrix operations.  ... 
doi:10.3233/ida-2009-0369 fatcat:dpbz6qtg3fgkfh272gynnkpaky
« Previous Showing results 1 — 15 out of 13,797 results