209,657 Hits in 6.8 sec

Development of Document Clustering Technique for Gurmukhi Script using Fuzzy Term Weight

2019 International journal of recent technology and engineering  
as extraction of named/noun entities, creation of cluster titles and placing text documents into relevant clusters using fuzzy term weight.  ...  , and in terms of performance regarding assignment of real time unlabeled data sets to the relevant cluster as a result of various pre-processing steps like segmentation, stemming, normalization as well  ...  Thus, the two words with different spellings "ਿਵੱ ਤੀ" and "ਿਵਤੀ" will be considered as similar words and after performing normalization process, the text documents consisting non-uniform spellings of words  ... 
doi:10.35940/ijrte.b2386.078219 fatcat:opzm46oez5cy7lwhyonkswsjge

Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script

Various sub-phases of pre-processing phase are: segmentation, tokenization, removal of stop words, stemming, and normalization.  ...  This paper concentrates pre-processing phase of document clustering technique for Gurmukhi script. The purpose of pre-processing phase is to convert unstructured text into structured text format.  ...  The process of normalization allows representing similarity among the text documents which consist same words with non-uniform of spellings also; and placed such text documents in the same cluster.  ... 
doi:10.35940/ijitee.c9105.019320 fatcat:tbezkwuzmvaknlv2x55bkolaky

Textual Coherence Improvement of Extractive Document Summarization Using Greedy Approach and Word Vectors

Mohamad Abdolahi, Morteza Zahedi
2019 International Journal of Modern Education and Computer Science  
Suggested approach compares its result to similar model Q_Network and shows the superiority of its algorithm in confronting with long text document.  ...  There is a growing body of attention to importance of document summarization in most NLP tasks.  ...  Then, using the most likely n-grams in the same text, the matrices of sentences are normalized.  ... 
doi:10.5815/ijmecs.2019.04.03 fatcat:cknfrzgdpzdeji2lwvdwiap7ny

Application of Latent Dirichlet Allocation (LDA) for clustering financial tweets

Sifi Fatima-Zahrae, Sabbar Wafae, El Mzabi Amal, S. Krit
2021 E3S Web of Conferences  
For this purpose, different text preprocessing techniques have been used on the dataset to achieve an acceptable standard text.  ...  be used for analyzing user's comments on Twitter social network.  ...  To do that, we used normalization, removing stop words and tokenization to clean and cluster the data.  ... 
doi:10.1051/e3sconf/202129701071 fatcat:e6eerbzhpnfirihqqhtuwdd7jq

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis

Jingxia Ma, Vijay Kumar
2022 Computational Intelligence and Neuroscience  
Text clustering can improve the efficiency of information search and is an effective text retrieval method.  ...  Keyword extraction and cluster center point selection are key issues in text clustering research.  ...  Points that are correlated with more texts are used as cluster center points, and so on [26] [27] [28] [29] . e existing keyword extraction algorithms under text include semantic-based keyword extraction  ... 
doi:10.1155/2022/4293102 pmid:35387240 pmcid:PMC8979710 fatcat:esphsdolx5bqpa3gcjieybwzay

SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning

Taner UÇKAN, Cengiz HARK, Ali KARCİ
2020 Journal of Polytechnic  
Changes in words are made taking into account the ngram intersections and the relationship with other words in the text.  ...  evaluated using metrics commonly used in text clustering.  ...  DECLARATION OF ETHICAL STANDARDS The authors of this article declare that the materials and methods used in this study do not require ethical committee permission and/or legal-special permission.  ... 
doi:10.2339/politeknik.684558 fatcat:sjn2imwqzzellduynzn5grjtwq

Recent Developments in Text Clustering Techniques

Saurabh Sharma, Vishal Gupta
2012 International Journal of Computer Applications  
This paper reviews and discusses "Text Clustering" and partially covers all major techniques currently in use for the Process.  ...  Clustering of huge number of text documents into different clusters, for better management of information, provides for a wide area in which a whole lot of research is currently being pursued.  ...  Text Clustering follows a very important context of Semantic relationship between word & meaning which is termed as Hyponymy/hypernymy.  ... 
doi:10.5120/4611-6604 fatcat:xeis4dnwrna4leqzo6oymqpftq

Improved Text Summarization of News Articles Using GA-HC and PSO-HC

Muhammad Mohsin, Shazad Latif, Muhammad Haneef, Usman Tariq, Muhammad Attique Khan, Sefedine Kadry, Hwan-Seung Yong, Jung-In Choi
2021 Applied Sciences  
The proposed models use a word embedding model with Hierarchal Clustering Algorithm to group sentences conveying almost same meaning.  ...  This study proposed two automatic text summarization models which are Genetic Algorithm with Hierarchical Clustering (GA-HC) and Particle Swarm Optimization with Hierarchical Clustering (PSO-HC).  ...  TF-IDF plays an important role in identification of most important words goes through the text document. These words can help in extraction of important sentences form the document.  ... 
doi:10.3390/app112210511 fatcat:nvq5ltd7irhznkio7iffiohmwy

Domain Based Punjabi Text Document Clustering

Saurabh Sharma, Vishal Gupta
2012 International Conference on Computational Linguistics  
Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents.  ...  In this paper, a hybrid algorithm for clustering of Punjabi text document that uses semantic relations among words in a sentence for extracting phrases has been developed.  ...  Next step is normalization of those words which are used with different spellings. Purpose of normalization is to maintain uniformity of spelling in all documents which contain that word.  ... 
dblp:conf/coling/SharmaG12 fatcat:js3ookxm6bhutjs2pduvphkhsi

An Efficient Text Clustering Framework

Francis M.Kwale
2013 International Journal of Computer Applications  
Text mining on the other hand is an extension of data mining dealing only with (unstructured) text data. Text clustering is thus a text mining technique.  ...  One such technique is text clustering, whereby we group (or cluster) text documents into various groups (or clusters), such as clustering web search engine results into meaningful groups.  ...  It marks up the words in a text with their corresponding parts of speech.  Text chunking that groups adjacent words in a text.  Word Sense Disambiguation (WSD) that resolves ambiguities in words, including  ... 
doi:10.5120/13763-1607 fatcat:6g7bfn66czbpjotzrybnz6shpm

Document Length Variation in the Vector Space Clustering of News in Arabic: A Comparison of Methods

Abdulfattah Omar, Wafya Ibrahim
2020 International Journal of Advanced Computer Science and Applications  
Data is analyzed using different document length normalization methods along with vector space clustering (VSC), and then the analysis on which the clustering structure agrees most closely with the bibliographic  ...  This article is concerned with addressing the effect of document length variation on measuring the semantic similarity in the text clustering of news in Arabic.  ...  INTRODUCTION Variation in document length is widely considered an important factor in the validity of text clustering applications.  ... 
doi:10.14569/ijacsa.2020.0110211 fatcat:rm4kvhdbancvzd3f7xomoyakn4

Identifying Data Set Texture using Normalized Compression Distance

Shahany Habeeb, Syam Gopi
2015 International Journal of Engineering Trends and Technoloy  
Models that do not preserve text structure or that preserve text structure can be used for presenting text data sets.  ...  Here the main hypothesis is that depending on the nature of data set, there can be advantages of using a model that preserves text structure over one that does not, and vice versa.  ...  ACKNOWLEDGMENT We thank computer science department of Amal Jyothi College of Engineering for providing us with relevant data. This work was supported as part of thesis project.  ... 
doi:10.14445/22315381/ijett-v29p223 fatcat:znhscdtamfbijb2jsuimujb4yi

Term Based Semantic Clusters for Very Short Text Classification

Jasper Paalman, Jheronimus Academy of Data Science, The Netherlands, Shantanu Mullick, Kalliopi Zervanou, Yingqian Zhang, School of Industrial Engineering, Eindhoven University of Technology, The Netherlands, School of Industrial Engineering, Eindhoven University of Technology, The Netherlands, School of Industrial Engineering, Eindhoven University of Technology, The Netherlands
2019 Proceedings - Natural Language Processing in a Deep Learning World  
These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification.  ...  Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships.  ...  normalized term frequency. For a given class, word embeddings belonging to found terms are used to form numerous clusters.  ... 
doi:10.26615/978-954-452-056-4_102 dblp:conf/ranlp/PaalmanMZZ19 fatcat:hy5lgselmrbujpacemstkach4a

Extractive Multi-Document Arabic Text Summarization using Evolutionary Multi-Objective Optimization with K-medoid Clustering

Rana Alqaisi, Wasel Ghanem, Aziz Qaroush
2020 IEEE Access  
Tasks/Tools Our Usage Preprocessing tools Normalization: word list Key-Phrase extraction KP-Miner Sentence representation Bag-of-words with TF-ISF Clustering method K-medoids Clusters validation  ...  may rephrase the original words and use different words with the same meaning.  ... 
doi:10.1109/access.2020.3046494 fatcat:cpt4q3ryxzfqlitlgixvgwqu5q

Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur'an Translation Clustering with K-Means Algorithm

M Didik R Wahyudi
2021 Journal of Information Technology and Computer Science  
The Al-Quran translation index issued by the Ministry of Religion can be used in text mining to search for similar patterns of Al-Quran translation.  ...  This study performs sentence grouping using the K-Means Clustering algorithm and three weighting scheme models of the TF-IDF algorithm to get the best performance of the Tf-IDF algorithm.  ...  One way to research is with text mining. Various text mining methods can be used to group certain data, one of them is clustering. Text clustering is an important part of the text mining method.  ... 
doi:10.25126/jitecs.202162295 fatcat:wdkskuauovgzdpmudb7fneoikm
« Previous Showing results 1 — 15 out of 209,657 results