207,819 Hits in 7.6 sec

Unsupervised Feature Selection for Text Data [chapter]

Nirmalie Wiratunga, Rob Lothian, Stewart Massie
2006 Lecture Notes in Computer Science  
Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data.  ...  This utility informs the search for both representative and diverse features in two complementary ways: CLUSTER divides the entire feature space, before then selecting one feature to represent each cluster  ...  Introduction The volume of text content on the Internet and the widespread use of email-based communication have created a need for text classification, clustering and retrieval tools.  ... 
doi:10.1007/11805816_26 fatcat:k3lxfgcazfgbnabfzhyut6cdae

Text Document Clustering Using Dimension Reduction Technique

A. Sudha Ramkumar, B. Poorna
2016 International Journal of Applied Engineering Research  
Text document clustering is used to group a set of documents based on the information it contains and to provide retrieval results when a user browses the internet.  ...  Experimental evidences have shown that Information Retrieval applications can benefit from document clustering and it has been used as a tool to improve the performance of retrieval of information.  ...  Information gain (IG) is an effective Feature Selection method and is widely used in text data mining.  ... 
doi:10.37622/ijaer/11.7.2016.4770-4774 fatcat:crgalyqoyrg55b3z4iyhbz5la4


2016 International Journal of Advance Engineering and Research Development  
In this paper we discuss several approaches of text categorization, feature selection methods and applications of text categorization based on similarity.  ...  , by automatically extracting information from different written assets.An efficient and effective text document classification is becoming a challenging and highly required area to capably categorize  ...  Then we propose a new feature selection method called "Term Contribution (TC)" and perform a comparative study on a variety of feature selection methods for text clustering, including Document Frequency  ... 
doi:10.21090/ijaerd.c68 fatcat:6ugn6sujmbdlpd75hlsctavw4u

Automatic speech data clustering with human perception based weighted distance

Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Lianhong Cai, Weifeng Li
2014 The 9th International Symposium on Chinese Spoken Language Processing  
Moreover, x-means method clusters the data according to a pre-defined distance measurement considering different features.  ...  To address the problem, this paper proposes a method based on xmeans clustering, an extended version of k-means without fixed number of classes, for the task.  ...  X-means method clusters the data according to a pre-defined distance measurement considering different features.  ... 
doi:10.1109/iscslp.2014.6936604 dblp:conf/iscslp/WuWJMCL14 fatcat:maqxpdihfvaepbrr5xraoolvsu

The Hybrid Feature Selection k-means Method for Arabic Webpage Classification

Hanan Alghamdi, Ali Selamat
2014 Jurnal Teknologi  
) to build a hybrid feature selection model (Hybrid-FS) for k-means clustering.  ...  Therefore, in this paper, we propose a feature selection model that incorporates three different feature selection methods (CHI-squared, mutual information, and term frequency-inverse document frequency  ...  Acknowledgement The authors would like to extend their thanks to Universiti Teknologi Malaysia (UTM) Research University funding Vot 03H02 and Ministry of Higher Education, Saudi Arabia, for supporting  ... 
doi:10.11113/jt.v70.3518 fatcat:orgm2idhifck3lu5cpupbb7sae

Text Categorization based on Clustering Feature Selection

Xiaofei Zhou, Yue Hu, Li Guo
2014 Procedia Computer Science  
In this paper, we discuss a text categorization method based on k-means clustering feature selection.  ...  K-means is classical algorithm for data clustering in text mining, but it is seldom used for feature selection.  ...  Conclusions In this paper, we use k-means clustering method to collect and choose features for text categorization.  ... 
doi:10.1016/j.procs.2014.05.283 fatcat:k43nlrpp2baulgdmgqrrc4i6se

Concept Features Extraction and Text Clustering Analysis of Neural Networks Based on Cognitive Mechanism [chapter]

Lin Wang, Minghu Jiang, Shasha Liao, Beixing Deng, Chengqing Zong, Yinghua Lu
2006 Lecture Notes in Computer Science  
SOM can be used in text clustering in large scales and the clustering results are good when the concept feature is selected.  ...  The feature selection is an important part in automatic classification. In this paper, we use the HowNet to extract the concept attributes, and propose CHI-MCOR method to build a feature set.  ...  It is a function of the ratio of the sum for within-cluster distance and between-cluster distance.  ... 
doi:10.1007/11816157_23 fatcat:zpmmljrfvra6vpkht4b4xj3hhi

Unsupervised Text Topic-Related Gene Extraction for Large Unbalanced Datasets

2020 Tehnički Vjesnik  
As a result, the selected features cannot truly reflect the information of the original data set, which thus affects the subsequent performance of classifiers.  ...  Then, considering the influence of the unbalanced distribution of sample clusters on feature selection, the CHI statistical matrix feature selection method, which combines average local density and information  ...  The data distribution relationship is used for feature selection.  ... 
doi:10.17559/tv-20191111095139 fatcat:rmrafzdjc5c4ljhzyfuioylbfa

Evaluation of the Performance and Efficiency of the Automated Linguistic Features for Author Identification in Short Text Messages Using Different Variable Selection Techniques

Refat Aljumily
2018 Studies in Media and Communication  
The relationships between known and anonymous text messages were examined using hierarchical linear and non-hierarchical nonlinear clustering methods, taking into accounts the nonlinear patterns among  ...  The evaluation of the 16 linguistic feature types differ from those of other analyses because the study used different variable selection methods including feature type frequency, variance, term frequency  ...  Specifically: SOM is a nonlinear method based on preservation of data topology; Complete and Flexible Beta clustering are both linear methods based on preservation of distance relations in data space,  ... 
doi:10.11114/smc.v6i2.3892 fatcat:xvyh6zbejjhb5l3s45ek7jo2s4

An Improved K-Lion Optimization Algorithm With Feature Selection Methods for Text Document Cluster

Jagatheeshkumar. G, S. Selva Brunda
2018 International Journal of Computer Sciences and Engineering  
Text Document clustering is a passion or an interested area of data mining. Many of the clustering method needed for a new one requires better clustering approaches.  ...  A new proposal is an improved KLOA with feature selection method for text mining that is Improved KLOA. K-means is one of the active algorithms for wider application of clustering technique.  ...  High dimensionality of data takes over efficiency and effectiveness point of view in feature selection algorithm [11] .A cluster based approach for good feature selection evaluated using minimum variance  ... 
doi:10.26438/ijcse/v6i7.245251 fatcat:kfvrbzaqnnfj7he3dob73likmm

A K-means clustering method with feature learning for unbalanced vehicle fault diagnosis

Bo Wang, Guanwei Wang, Youwei Wang, Zhengzheng Lou, Shizhe Hu, Yangdong Ye
2021 Smart and Resilient Transport  
Design/methodology/approach This study proposes a novel K-means with feature learning based on the feature learning K-means-improved cluster-centers selection (FKM-ICS) method, which includes the ICS and  ...  The ICS enables the FKM-ICS method to exclude the effect of outliers, solves the disadvantages of the fault text data contained a certain amount of noisy data, which effectively enhanced the method stability  ...  Therefore, we propose a method for initial cluster center selection based on density and distance based on the density peak method idea.  ... 
doi:10.1108/srt-01-2021-0003 fatcat:ypzc4i4esnbgddfh3iajevspza

Opinion Texts Clustering Using Manifold Learning Based on Sentiment and Semantics Analysis

Sajjad Jahanbakhsh Gudakahriz, Amir Masoud Eftekhari Moghadam, Fariborz Mahmoudi, Sikandar Ali
2021 Scientific Programming  
Manifold learning is a powerful tool for nonlinear dimension reduction of high-dimensional data.  ...  This type of clustering helps users of opinion texts to obtain more useful information from texts and also provides more accurate summaries in applications, such as the summarization of opinion texts.  ...  In [29] , another method was presented for clustering of microblog texts, which uses the feature selection technique as a dimension reduction method.  ... 
doi:10.1155/2021/7842631 fatcat:nepcbzgcqremhhb5qkvgd3276i

An Advanced Fuzzy Constructing Algorithm for Feature Discovery in Text Mining

Evana Ramalakshmi, Subhakar Golla
2015 International Journal of Computer Applications  
It is a big task to provide the accuracy of discovered relevance features in text documents for describing user requirements.  ...  Classification of data is biggest issue in more text documents because they have large number of words and data patterns. Most existing popular methods are used by word-based approaches.  ...  Most of the feature selection methods are used the bag of words representation to select a set of features for the multi-class problem.  ... 
doi:10.5120/ijca2015906720 fatcat:e2r7yyw4tnavbi2ldtwz7w25ay

Information Extraction using Tokenization and Clustering Methods

2019 International journal of recent technology and engineering  
Text mining is a method for extracting meaningful information from large volume of data. Unstructured text is easily processed by humans but it is harder for machines.  ...  Text mining task involve methods such as tokenization, feature extraction and clustering.  ...  Next, a truncation method is used. In this method, the selected features are truncated to the nearest possible value. So, all the features that remain closer have a common truncated value.  ... 
doi:10.35940/ijrte.d7943.118419 fatcat:j7wzu4sq3ffs5gbgzyatjakdwq

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

2020 KSII Transactions on Internet and Information Systems  
Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection  ...  Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected.  ...  Conclusion This paper studied the problem of microblog user geolocation, and proposed a text-based user geolocation method using extracted local words based on word clustering and wrapper feature selection  ... 
doi:10.3837/tiis.2020.10.003 fatcat:wrkmxvxm2naplmqquelpdvffie
« Previous Showing results 1 — 15 out of 207,819 results