53,520 Hits in 3.2 sec

Exhaustive and Efficient Constraint Propagation: A Graph-Based Learning Approach and Its Applications

Zhiwu Lu, Yuxin Peng
2012 International Journal of Computer Vision  
The resulting exhaustive set of propagated pairwise constraints are further used to adjust the similarity matrix for constrained spectral clustering.  ...  Considering that this time cost is proportional to the number of all possible pairwise constraints, our approach actually provides an efficient solution for exhaustively propagating pairwise constraints  ...  In the following, since the pairwise constraints used for constrained spectral clustering (CSC) is obtained by our exhaustive and efficient constraint propagation (E 2 CP), the above clustering algorithm  ... 
doi:10.1007/s11263-012-0602-z fatcat:aly6m6seo5gx7ebqujzpj6xbca

An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints

Jinfeng Yi, Lijun Zhang, Tianbao Yang, Wei Liu, Jun Wang
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
Given emerged new constraints, classical semi-supervised clustering algorithms need to re-optimize their objectives over all data samples and constraints in availability, which prevents them from efficiently  ...  applications such as social network and e-commerce system analysis.  ...  To tackle this challenging problem, in this paper we propose an efficient dynamic semi-supervised clustering framework for large-scale data mining applications [48, 22, 40, 41] .  ... 
doi:10.1145/2783258.2783389 dblp:conf/kdd/Yi0YLW15 fatcat:x5jks4ponjh3rjurnwlvovgd6y

Approximate pairwise clustering for large data sets via sampling plus extension

Liang Wang, Christopher Leckie, Ramamohanarao Kotagiri, James Bezdek
2011 Pattern Recognition  
Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets.  ...  Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data  ...  The eNERF algorithm described in [3] performs approximate clustering for large pairwise relational data.  ... 
doi:10.1016/j.patcog.2010.08.005 fatcat:b2azjgg25vg75fx5rvbqbmzmku

Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce

Jimmy Lin
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
Each algorithm supports one or more approximations that trade effectiveness for efficiency, the characteristics of which are studied experimentally.  ...  However, the other two algorithms support approximations that yield large efficiency gains without significant loss of effectiveness.  ...  CONCLUSION This paper describes three algorithms for computing pairwise similarity on document collections and presents experimental results for the application of "more like this" queries in the life  ... 
doi:10.1145/1571941.1571970 dblp:conf/sigir/Lin09 fatcat:wuq4db7ouvhjblrqcwovu7gkny

An Effective and Efficient Approach for Clusterability Evaluation [article]

Margareta Ackerman, Andreas Adolfsson, Naomi Brownstein
2016 arXiv   pre-print
In this paper, we propose a novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure of real data.  ...  Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible  ...  Our development of an effective and efficient approach for evaluating clusterability enables several important areas of investigation.  ... 
arXiv:1602.06687v1 fatcat:umvuvsw34zec7jmbw7aq2fllty

Non-Parametric Kernel Learning with robust pairwise constraints

Changyou Chen, Junping Zhang, Xuefang He, Zhi-Hua Zhou
2011 International Journal of Machine Learning and Cybernetics  
The reason is that for clustering, the pairwise constraints provide useful information about which data pairs are in the same category and which ones are not.  ...  However, the above clustering algorithms via kernel matrices either can not scale well with the increasing number of pairwise constraints and the amount of data, or lacks theoretical guarantee for the  ...  In addition, we will apply the proposed algorithm to more real applications, and explore more efficient algorithms for this problem since the current methods is not fast enough for large scale datasets  ... 
doi:10.1007/s13042-011-0048-6 fatcat:2nao4vg4f5eezkifccwzrys2bu

Relationship Matrix Nonnegative Decomposition for Clustering

Ji-Yuan Pan, Jiang-She Zhang
2011 Mathematical Problems in Engineering  
For a positive pairwise similarity matrix, symmetric NMF (SNMF) and weighted NMF (WNMF) can be used to cluster the data.  ...  However, both of them are not very efficient for the ill-structured pairwise similarity matrix.  ...  Thus, AA T and ASA T would be approximately block diagonal matrices. SNMF and WNMF learn good approximation for X. This is the simplest case in data clustering.  ... 
doi:10.1155/2011/864540 fatcat:bnxt5zotcnfalagq6h3pqnagkq

Similarity Preserving Representation Learning for Time Series Clustering [article]

Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit S. Dhillon
2019 arXiv   pre-print
This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use.  ...  A considerable amount of clustering algorithms take instance-feature matrices as their inputs.  ...  For instance, clustering the ED data set takes k-Shape, CLDS, and kMeans-DTW 20, 57, and 114 minutes, respectively.  ... 
arXiv:1702.03584v3 fatcat:ttvtnyunhzfr5b3nxgb5y2c3xe

Efficient SPectrAl Neighborhood blocking for entity resolution

Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng
2011 2011 IEEE 27th International Conference on Data Engineering  
We investigate the entity resolution problem for large data sets where efficient and scalable solutions are needed.  ...  Our experimental results with both synthetic and real-world data demonstrate that SPAN is robust and outperforms other blocking algorithms in terms of accuracy while it is efficient and scalable to deal  ...  data.  ... 
doi:10.1109/icde.2011.5767835 dblp:conf/icde/ShuCXM11 fatcat:mvysvekxkjacxcvgjtpst5hqyu

Efficient Boundary Values Generation in General Metric Spaces for Software Component Testing [chapter]

Alfredo Ferro, Rosalba Giugno, Alfredo Pulvirenti
2003 Lecture Notes in Computer Science  
We propose efficient approximate algorithms that for any k generate both k nucleus and perimeter elements.  ...  However even a quadratic algorithm can be prohibitive due to a sufficiently large size of the input domain. Our approximate algorithms run in O(kn) time.  ...  We would like to thank Angelo Gargantini for useful discussions and suggestions.  ... 
doi:10.1007/978-3-540-39910-0_15 fatcat:gpqk66534bemjkqz6kg2tnnmvi

Performance of Windows Multicore Systems on Threading and MPI

Judy Qiu, Scott Beason, Seung-Hee Bae, Saliya Ekanayake, Geoffrey Fox
2010 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing  
We look at performance of two significant bioinformatics applications; gene clustering and dimension reduction.  ...  MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node.  ...  Haixu Tang and Mina Rho gave us important feedback on Alu and Metagenomics data and we would like to thank coauthors on our conference paper [17] for their earlier work on which we build.  ... 
doi:10.1109/ccgrid.2010.105 dblp:conf/ccgrid/QiuBBEF10 fatcat:uw5utdknnjdydfeffaxmx5qfaq

Cost functions for pairwise data clustering

L. Angelini, L. Nitti, M. Pellicoro, S. Stramaglia
2001 Physics Letters A  
The partition provided by these cost functions identifies clusters with dense connected regions in data space; differences and similarities with respect to a well known cost function for pairwise clustering  ...  Cost functions for non-hierarchical pairwise clustering are introduced, in the probabilistic autoencoder framework, by the request of maximal average similarity between the input and the output of the  ...  We describe now the application of the variational criterions for clustering, described above, to some artificial and real data-sets.  ... 
doi:10.1016/s0375-9601(01)00373-5 fatcat:5ygtln5abzap7jgt7oexundmhy

Performance of windows multicore systems on threading and MPI

Judy Qiu, Seung-Hee Bae
2011 Concurrency and Computation  
We look at performance of two significant bioinformatics applications; gene clustering and dimension reduction.  ...  MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node.  ...  Haixu Tang and Mina Rho gave us important feedback on Alu and Metagenomics data and we would like to thank coauthors on our conference paper [17] for their earlier work on which we build.  ... 
doi:10.1002/cpe.1762 fatcat:czc6qeoctncu7mm4oafqra3ppq

SAR Image Compression Using Integer to Integer Transformations, Dimensionality Reduction, and High Correlation Modeling

Sergey Voronin
2022 Journal of Computer and Communications  
In this document, we present new techniques for near-lossless and lossy compression of SAR imagery saved in PNG and binary formats of magnitude and phase data based on the application of transforms, dimensionality  ...  In particular, we discuss the use of blockwise integer to integer transforms, subsequent application of a dimensionality reduction method, and Burrows-Wheeler based lossless compression for the PNG data  ...  A clustering scheme can be utilized based on the correlation information between data vectors to identify similar data clusters and to approximate elements of each cluster.  ... 
doi:10.4236/jcc.2022.102002 fatcat:novvmbjzcbgaro6w7emvprp2te

Deterministic annealing for unsupervised texture segmentation [chapter]

Thomas Hofmann, Jan Puzicha, Joachim M. Buhmann
1997 Lecture Notes in Computer Science  
In this paper a rigorous mathematical framework of deterministic annealing and mean-field approximation is presented for a general class of partitioning, clustering and segmentation problems.  ...  We describe the canonical way to derive efficient optimization heuristics, which have a broad range of possible applications in computer vision, pattern recognition and data analysis.  ...  The method is presented in a unifying way for a larger class of partitioning problems and extend the pairwise clustering algorithm derived in [5] to sparse dissimilarity data.  ... 
doi:10.1007/3-540-62909-2_82 fatcat:sb6mjuptdjfddeklbal4toc6mq
« Previous Showing results 1 — 15 out of 53,520 results