Filters








129,070 Hits in 4.9 sec

Frequent term-based text clustering

Florian Beil, Martin Ester, Xiaowei Xu
2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02  
To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents.  ...  We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering.  ...  4 compared to bisecting k-means) for the WAP data (for 20 clusters).  ... 
doi:10.1145/775107.775110 fatcat:5ruxn5obwnau3ptabdmdpoxbve

Frequent term-based text clustering

Florian Beil, Martin Ester, Xiaowei Xu
2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02  
To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents.  ...  We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering.  ...  4 compared to bisecting k-means) for the WAP data (for 20 clusters).  ... 
doi:10.1145/775047.775110 dblp:conf/kdd/BeilEX02 fatcat:cnkrq6xvwvao5ncwqqksgh6mse

A segment-based approach to clustering multi-topic documents

Andrea Tagarelli, George Karypis
2012 Knowledge and Information Systems  
We empirically give evidence of the significance of our segment-based approach on large collections of multi-topic documents, and we compare it to conventional methods for document clustering.  ...  Existing methods for document clustering have traditionally assumed that a document is an indivisible unit for text representation and similarity computation, which may not be appropriate to handle documents  ...  Specifically, for disjoint clustering solutions, we use Spherical k-Means (Sk-Means) [12, 30, 52] , whereas for overlapping clustering solutions, we use fuzzy Spherical k-Means (FSk-Means) [53] and  ... 
doi:10.1007/s10115-012-0556-z fatcat:istysfkpabglvpvyiz6il4vcwy

Comparative study of clustering techniques for short text documents

Aniket Rangrej, Sayali Kulkarni, Ashish V. Tendulkar
2011 Proceedings of the 20th international conference companion on World wide web - WWW '11  
We compare various document clustering techniques including K-means, SVD-based method and a graph-based approach and their performance on short text data collected from Twitter.  ...  We define a measure for evaluating the cluster error with these techniques.  ...  Table 1 : 1 Affinity propagation based method works best for short text document clustering Table 2 : 2 Comparison of different distance measures 1-overlap 2-overlap Cluster-error 7.47% 32.26%  ... 
doi:10.1145/1963192.1963249 dblp:conf/www/RangrejKT11 fatcat:abjwc2fhlnbvxbsbjdni5xapyy

Clustering patents using non-exhaustive overlaps

Charles V. Trappey, Amy J.C. Trappey, Chun-Yi Wu
2010 Journal of Systems Science and Systems Engineering  
The non-exhaustive clustering approach allows for the clustering of patent documents with overlapping technical findings and claims, a feature that enables the grouping of patents that define related key  ...  A clustering algorithm with non-exhaustive overlaps is proposed to overcome deficiencies with exhaustive clustering methods used in patent mining and technology discovery.  ...  After clustering the patent documents, the proposed clustering algorithm is compared to the centroid based K-means method.  ... 
doi:10.1007/s11518-010-5134-x fatcat:wlqumj7bfnhpndgpwykvwbmd6q

A New Approach to Search Result Clustering and Labeling [chapter]

Anil Turel, Fazli Can
2011 Lecture Notes in Computer Science  
Our method emphasizes clustering quality by using cover coefficient-based and sequential k-means clustering algorithms.  ...  A cluster labeling method based on term weighting is also introduced for reflecting cluster contents.  ...  In this method, clustering and labeling steps are accomplished using suffix tree. Our search result clustering method, C 3 M+K-means is based on C 3 M and sequential k-means algorithms.  ... 
doi:10.1007/978-3-642-25631-8_26 fatcat:5kembxlry5g5nnibufxyyti7ta

Methodologies for Improved Tag Cloud Generation with Clustering [chapter]

Martin Leginus, Peter Dolog, Ricardo Lage, Frederico Durao
2012 Lecture Notes in Computer Science  
Tag clouds are useful means for navigation in the social web systems. Usually the systems implement the tag cloud generation based on tag popularity which is not always the best method.  ...  We show that by extending cloud generation based on tag popularity with clustering we slightly improve coverage.  ...  This work has been supported by FP7 ICT project M-Eco: Medical Ecosystem Personalized Event-Based Surveillance under grant No. 247829.  ... 
doi:10.1007/978-3-642-31753-8_5 fatcat:33pltpzhxbaofig567iqdrpwci

Semantic based Document Clustering: A Detailed Review

Neepa Shah, Sunita Mahajan
2012 International Journal of Computer Applications  
The PSO algorithm can be used to generate initial cluster centroids for the K-means, the major requirement of K-means algorithm.  ...  The most well-known partitioning methods are the K-means and its variants [4] . The basic K-means method initially allocates a set of objects to a number of clusters randomly.  ... 
doi:10.5120/8202-1598 fatcat:mb5hph2d6vhofmyxuyib7srgqq

Using bi-modal alignment and clustering techniques for documents and speech thematic segmentations

Dalila Mekhaldi, Denis Lalanne, Rolf Ingold
2004 Proceedings of the Thirteenth ACM conference on Information and knowledge management - CIKM '04  
This bi-modal method is suitable for multimodal applications that are centered on documents, such as meetings and lectures, where documents can be aligned with meeting dialogs.  ...  In this paper, we describe a new method for a simultaneous thematic segmentation of the meeting dialogs and the documents discussed or visible throughout the meeting.  ...  • Partitioning methods, where the data set is decomposed directly to a set of clusters, so that each datum belongs to only one cluster, e.g. the K-Means method [11] .  ... 
doi:10.1145/1031171.1031185 dblp:conf/cikm/MekhaldiLI04 fatcat:syuhfigmnvgufoqjmf3mwy4auq

Efficient Prediction-Based Validation for Document Clustering [chapter]

Derek Greene, Pádraig Cunningham
2006 Lecture Notes in Computer Science  
Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation.  ...  In this paper we present an efficient prediction-based validation approach suitable for application to large, high-dimensional datasets such as text corpora.  ...  To tackle the computational issues of stability analysis, we now introduce an efficient prediction-based validation method suitable for use in document clustering tasks.  ... 
doi:10.1007/11871842_65 fatcat:7ilyypl6cfgnverrtahtyck5pu

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling

Mubashar Mustafa, Feng Zeng, Hussain Ghulam, Hafiz Muhammad Arslan
2020 Information  
For the datasets, two conditions are considered for document clustering, one is "Dataset without overlapping" in which all classes have distinct nature.  ...  We apply the proposed model and other methods to Urdu news datasets for categorizing.  ...  K-Means This algorithm was first proposed in 1957 while the term "K-means" was first used in 1967 [12] . K-means clustering is a method that is commonly used for cluster analysis in data mining.  ... 
doi:10.3390/info11110518 fatcat:4t3pre3d2vegzf2kojgeqhflsu

Sentence clustering in text document using fuzzy clustering algorithm

S Sruthi, L Shalini
2014 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)  
Instead of searching entire documents for relevant information, these clusters will improve the efficiency and avoid overlapping of contents.  ...  Performance measures document clustering and its application in document summarization. When number of documents increases, time and second is less in k-means when compared with birch.  ...  LITRATURE REVIEW Given an integer K, K-means partitions the data set into K non overlapping clusters.  ... 
doi:10.1109/iccicct.2014.6993192 fatcat:spgmj3n7xrcevlh7dknfdutufy

Document overlapping clustering using formal concept analysis

2016 Journal of Advances in Technology and Engineering Research  
In previous studies, graph-based clustering algorithm is one of common methods to build overlapping clusters.  ...  For the FCubed metric, HACrelated algorithms are better than K-means.  ... 
doi:10.20474/jater-2.2.1 fatcat:krvtrms3gfhu5gf7sfzr7djuxa

Web document clustering

Oren Zamir, Oren Etzioni
1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98  
A key requirement is that the methods create their clusters based on the short snippets returned by Web search engines.  ...  based on phrases shared between documents.  ...  Karp and Zhenya Sigal for their contributions to this research. We thank Jody Daniels, Marti Hearst, David Lewis and Yoelle Maarek for commenting on earlier draft of this paper.  ... 
doi:10.1145/290941.290956 dblp:conf/sigir/ZamirE98 fatcat:2vz7rideiva2fglnbycymehehi

Using Data Fusion for a Context Aware Document Clustering

P. Venkateshkumar, A. Subramani
2013 International Journal of Computer Applications  
In this paper, a new method for clustering documents is proposed. In the proposed method, the term frequency of the document collection is computed and contexts based terms are fused.  ...  Agglomerative clustering and Bisecting K-Means are used to cluster the extracted features.  ...  RELATED WORKS Singh et al [11] presented performance evaluation for clustering text documents based on K-means, heuristic Kmeans and fuzzy C-means algorithms.  ... 
doi:10.5120/12497-7430 fatcat:dfd5edpk7bbnddmnu5cyhqoil4
« Previous Showing results 1 — 15 out of 129,070 results