51,106 Hits in 6.3 sec

Text Categorization based on Clustering Feature Selection

Xiaofei Zhou, Yue Hu, Li Guo
2014 Procedia Computer Science  
In this paper, we discuss a text categorization method based on k-means clustering feature selection.  ...  On three normal text databases, classifiers based on our feature selection method exhibit better performances than original classifiers for text categorization.  ...  Experimental results of text categorization with KMF on DBworld dataset On Farm-ads dataset (see Fig. 3.), the left of figure (a)(b)(c) show the results with different number of cluster centroids (k=1,2  ... 
doi:10.1016/j.procs.2014.05.283 fatcat:k43nlrpp2baulgdmgqrrc4i6se

Evaluation of text document clustering approach based on particle swarm optimization

Stuti Karol, Veenu Mangat
2013 Open Computer Science  
Text Document Clustering refers to the clustering of related text documents into groups based upon their content.  ...  AbstractClustering, an extremely important technique in Data Mining is an automatic learning technique aimed at grouping a set of objects into subsets or clusters.  ...  Examples of search based techniques used to approach clustering as optimization problems are SA (Simulated Annealing) and Tabu Search.  ... 
doi:10.2478/s13537-013-0104-2 fatcat:aqkdvyg5cvfpfgk6amqghgs6fq

Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

Waheeb, Khan, Chen, Shang
2020 Information  
A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, and a statistically-based  ...  Lastly, we use weighted principal component analysis (W-PCA) to map the sentences' encoded weights based on a list of features.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/info11020059 fatcat:qhu772l35zaqxntyskuynkl3ve

An Efficient EM based Ontology Text-mining to Cluster Proposals for Research Project Selection

D. Saravana Priya, M. Karthikeyan
2014 Research Journal of Applied Sciences Engineering and Technology  
Ontology's build the task of searching alike pattern of text that to be more effectual, efficient and interactive.  ...  The present method for combine proposals for selection of research project is proposed by ontology based text mining technique to the data mining approach of cluster research proposals support on their  ...  target list of categories, too ('supervised categorization' or 'text classification'; (Fabrizio, 2002) .  ... 
doi:10.19026/rjaset.8.1118 fatcat:bmd5oayy6vg6le57j6oofj5tey

An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text

M. John Basha, K.P. Kaliyamurthie
2017 International Journal of Electrical and Computer Engineering (IJECE)  
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters.  ...  To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset.  ...  Lee, et al [2] suggested a fuzzy based method to classify the text present in multi category document.  ... 
doi:10.11591/ijece.v7i1.pp551-558 fatcat:uczortzfc5hzjklygrfbplvoze

YouTubeCat: Learning to categorize wild web videos

Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar, Baoxin Li
2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
Then, using the hierarchical taxonomy of the categories, a Conditional Random Field (CRF) based fusion strategy is designed.  ...  We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages.  ...  For example, in Figure 3 , M Sc is a content-based and M St is a text-based model for the combination of manually-labeled data and searched data.  ... 
doi:10.1109/cvpr.2010.5540125 dblp:conf/cvpr/WangZSKL10 fatcat:e6ieenc53nhcliadzps6z4p3eu

Content-based Text Categorization using Wikitology [article]

Muhammad Rafi, Sundus Hassan, Mohammad Shahid Shaikh
2012 arXiv   pre-print
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents.  ...  The experimental studies on the text mining datasets reveal that this new similarity measure is more effective as compared to commonly used similarity measures in text clustering.  ...  We would also like to thanks National University of Computer & Emerging Sciences, for its kind support in carrying out this research project.  ... 
arXiv:1208.3623v1 fatcat:ol22a5el5vdivlintokuagiun4

Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

Michael Roth, Anette Frank
2012 Conference on Empirical Methods in Natural Language Processing  
Secondly, we present a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts.  ...  The contribution of this work is two-fold: We first construct a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments.  ...  We motivate the choice of the corpus and present a strategy for extracting comparable text pairs.  ... 
dblp:conf/emnlp/RothF12 fatcat:i7o545v4mngtdiqjlynk3erkaq

Multi-modality web video categorization

Linjun Yang, Jiemin Liu, Xiaokang Yang, Xian-Sheng Hua
2007 Proceedings of the international workshop on Workshop on multimedia information retrieval - MIR '07  
Specifically, we propose two novel modalities including a semantic modality and a surrounding text modality, as effective complements to most commonly used low-level features.  ...  This paper reports a first comprehensive study and largescale test on web video (so-called user generated video or micro video) categorization.  ...  Finally a set of rule based classifiers and a classifier combination strategy are used to get the final result.  ... 
doi:10.1145/1290082.1290119 dblp:conf/mir/YangLYH07 fatcat:lpiiw2e3kvbuleejndhhcns3zu

Automatic Document Categorization [chapter]

Benno Stein, Sven Meyer zu Eissen
2003 Lecture Notes in Computer Science  
The categorization performance of a document clustering algorithm can be captured by the F -Measure, which quantifies how close a human-defined categorization has been resembled.  ...  Clustering a document collection is the current approach to automatically derive underlying document categories.  ...  Introduction Clustering is a key concept in automatic document categorization and means grouping together texts with similar topics [5, 42] . It can serve several purposes: 1.  ... 
doi:10.1007/978-3-540-39451-8_19 fatcat:5u4s7llxezextgwdufe6gcqow4

A Novel Method of Spam Mail Detection using Text Based Clustering Approach

M. Basavaraju, Dr. R. Prabhakar
2010 International Journal of Computer Applications  
A new spam detection technique using the text clustering based on vector space model is proposed in this research paper.  ...  Each cluster is abstracted using one or more representatives. It models data by its clusters. Clustering is a type of classification imposed on a finite set of objects.  ...  IX COMPARISONS OF BIRCH & K-MEANS WITH DATASETS Sensitivity to input pattern of dataset Yes No Cluster Quality (center location, number of data point in a cluster, radii of clusters) Finally, comparisons  ... 
doi:10.5120/906-1283 fatcat:uzy5ptk7g5hn5cc2wult24jchy

Associative Classification in Text Categorization [chapter]

Jian Chen, Jian Yin, Jun Zhang, Jin Huang
2005 Lecture Notes in Computer Science  
Text categorization has become one of the key techniques for handling and organizing text data. This model is used to classify new article to its most relevant category.  ...  In this paper, we propose a novel associative classification algorithm ACTC for text categorization.  ...  Then we can use these clusters as features for text categorization in real applications.  ... 
doi:10.1007/11538059_107 fatcat:msfzsvaq4nf5jmxlpqq5ba7bgy

Improved Nearest Neighbour Approach for Document Categorization

Rimpy Wadhawan, Saurabh Mittal
2017 International Journal of Emerging Research in Management and Technology  
Text Categorization is an issue in Data mining. The application of Text clustering can be categorized to two types, online and offline.  ...  It is an approach of machine learning in the form of Natural Language Processing (NLP). The task is to assign a text to one or more classes or categories.  ...  The goal of text categorization is to classify a set of documents into a fixed number of predefined categories. Each document may belong to more than one class.  ... 
doi:10.23956/ijermt/v6n2/130 fatcat:vgs3wiz5ovdp7nv56uylvtke6e

Co-clustering for Auditory Scene Categorization

Rui Cai, Lie Lu, Alan Hanjalic
2008 IEEE transactions on multimedia  
Moreover, we also extend the co-clustering scheme with a strategy based on the Bayesian information criterion (BIC) to automatically estimate the numbers of clusters.  ...  Co-clustering achieved a better performance compared to some traditional one-way clustering algorithms, both based on the low-level acoustic features and on the mid-level audio effect representations.  ...  Moreover, we extended the algorithm with a BIC-based strategy to automatically select the numbers of clusters in co-clustering.  ... 
doi:10.1109/tmm.2008.921739 fatcat:vox53lvwknhi5huwujb6jyyeju

Sequential patterns for text categorization

S. Jaillet, A. Laurent, M. Teisseire
2006 Intelligent Data Analysis  
Text categorization is a well-known task based essentially on statistical approaches using neural networks, Support Vector Machines and other machine learning algorithms.  ...  Texts are generally considered as bags of words without any order.  ...  Contrary to C4.5 [29] , CN2 [9] , or RIPPER [10] , which use heuristic search to learn a subset of the regularities in data to build a classifier, CBA is based on exhaustive search and aims at finding  ... 
doi:10.3233/ida-2006-10302 fatcat:e74x4w3xgjdhdhc5ptaqngq5oi
« Previous Showing results 1 — 15 out of 51,106 results