Filters








64,663 Hits in 7.1 sec

Toward the highest effectiveness in text description-based service retrieval

Isaac B. Caicedo-Castro, Marie-Christine Fauvet, Ahmed Labath, Helga Duarte-Amaya
2015 Document Numérique  
From the results of the experiments, we conclude that the IR model of this family, which is based on query expansion via a co-occurrence thesaurus outperforms the effectiveness of all the models studied  ...  Therefore, we have implemented this model in a text description-based service search engine, which is part of a system designed to provide nomad users with services that fulfil users' needs expressed in  ...  Another hybrid approach where K-means algorithm is used to divide the corpus in several clusters of documents is proposed (see (Pan, Zhang, 2009) ).  ... 
doi:10.3166/dn.18.2-3.155-177 fatcat:bmlz2vfo7ncntmuvc3nuonaqlm

Semi-Supervised Linear Discriminant Clustering

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Fu-Sheng Gou
2014 IEEE Transactions on Cybernetics  
The goal is to find a feature space where the K-means can perform well in the new space.  ...  The proposed algorithm considers clustering and dimensionality reduction simultaneously by connecting K-means and linear discriminant analysis (LDA).  ...  [12] , [13] , and clustering-based approaches [14] - [16] .  ... 
doi:10.1109/tcyb.2013.2278466 pmid:23996591 fatcat:dpxxp6lcyraxhb2rzrbryy2pqa

Clustering with Balancing Constraints [chapter]

Joydeep Ghosh, Ayhan Demiriz
2008 Constrained Clustering  
data such as text documents.  ...  This chapter describes several approaches to obtaining balanced clustering results that also scale well to large data sets.  ...  This research was supported in part by the Digital Technology Center Data Mining Consortium (DDMC) at the University of Minnesota, Twin Cities, and NSF grants IIS 0307792 and III-0713142. 30Constrained Clustering  ... 
doi:10.1201/9781584889977.ch8 fatcat:kj5gtm37ebbmtcvk3zw2dw2bde

Seed-Guided Deep Document Clustering [chapter]

Mazar Moradi Fard, Thibaut Thonet, Eric Gaussier
2020 Lecture Notes in Computer Science  
This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches.  ...  In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach.  ...  Conclusion We have introduced in this paper the SD2C framework, the first attempt, to the best of our knowledge, to constrain document clustering with seed words using a deep clustering approach.  ... 
doi:10.1007/978-3-030-45439-5_1 fatcat:cug7brgy6bdxzcrwynaiarcz6y

Clustering Genes Using Heterogeneous Data Sources

Erliang Zeng, Chengyong Yang, Tao Li, Giri Narasimhan
2010 International Journal of Knowledge Discovery in Bioinformatics  
For the constrained clustering algorithm, we have studied the effectiveness of various constraints sets.  ...  To deal with incomplete data sources, we have adopted the MPCK-means clustering algorithm, which is a constrained clustering algorithm, to perform exploratory analysis on one complete source (such as gene  ...  In the second approach, the spherical K-means algorithm, which is a K-means algorithm using cosine-based distance, was applied to the gene-term matrix T times (we chose T = 100 here).  ... 
doi:10.4018/jkdb.2010040102 fatcat:i65e5huzurcord6yaojw44jknu

The optimum clustering framework: implementing the cluster hypothesis

Norbert Fuhr, Marc Lechtenfeld, Benno Stein, Tim Gollub
2011 Information retrieval (Boston)  
Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries.  ...  In this paper, we present a theoretic foundation for optimum document clustering.  ...  Other clustering approaches that are based on more advanced document representations use the document features in the collection (or a subset thereof) as queries.  ... 
doi:10.1007/s10791-011-9173-9 fatcat:6vg4ismou5hunihe7mz5kvspe4

Large-scale multi-dimensional document clustering on GPU clusters

Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas Potok
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
of flocking-based document clustering.  ...  This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state.  ...  Acknowledgement This work was supported in part by NSF grant CCF-0429653, CCR-0237570 and a subcontract from ORNL. The  ... 
doi:10.1109/ipdps.2010.5470429 dblp:conf/ipps/ZhangMCP10 fatcat:i43znbjewfaxzbb73lxwjh6kpi

Clustering tagged documents with labeled and unlabeled documents

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Chun-Hsien Chen
2013 Information Processing & Management  
This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance  ...  The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters.  ...  In addition to the above K-means variant approaches, there are many semi-supervised clustering approaches that are extended from the other algorithms.  ... 
doi:10.1016/j.ipm.2012.12.004 fatcat:kdmjhrvg6fhnhpidrgrfq4dhpq

Neural Gas Clustering Adapted for Given Size of Clusters

Iveta Dirgová Luptáková, Marek Šimon, Ladislav Huraj, Jiří Pospíchal
2016 Mathematical Problems in Engineering  
Common clustering approaches cannot impose constraints on sizes of clusters. However, in many applications, sizes of clusters are bounded or known in advance.  ...  The convergence of algorithm towards an optimum is tested on simple illustrative examples.  ...  Since we do not allow the cluster size constraints to be relaxed, we did not compare our adapted neural gas algorithm with a constrained k-means algorithm but we compared our algorithm with balanced k-means  ... 
doi:10.1155/2016/9324793 fatcat:mbtsbmv6onb53atxdhry467mi4

Semi-supervised model-based document clustering: A comparative study

Shi Zhong
2006 Machine Learning  
The first two are extensions of the seeded k-means and constrained k-means algorithms studied by Basu et al. (2002) ; the last one is motivated by Cohn et al. (2000) .  ...  We compare three (slightly) different semi-supervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedback-based damnl, where damnl stands for multinomial model-based deterministic  ...  Basu et al. (2002) compared seeded spherical k-means and constrained spherical k-means for clustering documents and showed that the constrained version performs better.  ... 
doi:10.1007/s10994-006-6540-7 fatcat:n52fgsxlgfhgxkmzgtpuejpy2a

Characterizing pattern preserving clustering

Hui Xiong, Michael Steinbach, Arifin Ruslim, Vipin Kumar
2008 Knowledge and Information Systems  
Experimental results on document data show that HICAP can produce overlapping clusters that preserve useful patterns, but has relatively worse clustering performance than bisecting K-means with respect  ...  By contrast, in terms of entropy, K-CAP can perform substantially better than the bisecting K-means algorithm when data sets contain clusters of widely different sizes-a common situation in the real-world  ...  Constrained clustering (Tung, Ng, Lakshmanan and Han, 2001 ) is based on the idea of using standard clustering approaches, but restricting the clustering process.  ... 
doi:10.1007/s10115-008-0148-0 fatcat:d23657nmerdpfe43ivfexq7efq

XML data clustering

Alsayed Algergawy, Marco Mesiti, Richi Nayak, Gunter Saake
2011 ACM Computing Surveys  
In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content.  ...  The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered.  ...  In this phase, a partitional clustering algorithm based on a modified version of k-means is used. -Evaluation Criteria.  ... 
doi:10.1145/1978802.1978804 fatcat:zgparleb6nbkdnoxlcxn3vyrhm

Scalable, Balanced Model-based Clustering [chapter]

Shi Zhong, Joydeep Ghosh
2003 Proceedings of the 2003 SIAM International Conference on Data Mining  
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes.  ...  Instead of a maximum-likelihood (ML) assignment, a balanceconstrained approach is used for the sample assignment step.  ...  In this paper, we take a balance-constrained approach built upon the framework of probabilistic, model-based clustering [40] .  ... 
doi:10.1137/1.9781611972733.7 dblp:conf/sdm/ZhongG03 fatcat:5bjfvo2u2baz7cthogdgcqievi

Co-Bidding Graphs for Constrained Paper Clustering

Tadej ŠKvorc, Nada Lavrač, Marko Robnik-ŠIkonja, Marc Herbstritt
2016 Symposium on Languages, Applications and Technologies  
We present a two-tier constrained clustering method for automatic conference scheduling that can automatically assign paper presentations into predefined schedule slots instead of requiring the program  ...  We demonstrate a methodology which is capable to enrich textual information with graph based data and utilize both in an innovative machine learning application of clustering.  ...  A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 2012. 7 John A Hartigan and Manchek A Wong. Algorithm AS 136: A k-means clustering algorithm.  ... 
doi:10.4230/oasics.slate.2016.1 dblp:conf/slate/SkvorcLR16 fatcat:5mnubxxg4vbl7pnednbezylnka

Improving document clustering using automated machine translation

Xiang Wang, Buyue Qian, Ian Davidson
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
In this work, we propose an alternative approach to address this problem using the constrained clustering framework.  ...  This gives rise to an intriguing question: can we use the extra information to achieve a better clustering of the documents?  ...  research via ONR grants N00014-09-1-0712 Automated Discovery and Explanation of Event Behavior, N00014-11-1-0108 Guided Learning in Dynamic Environments and NSF Grant NSF IIS-0801528 Knowledge Enhanced Clustering  ... 
doi:10.1145/2396761.2396844 dblp:conf/cikm/WangQD12 fatcat:k3idu2evvvexxhpb23gvezvgam
« Previous Showing results 1 — 15 out of 64,663 results