119,043 Hits in 3.7 sec

Comparison of Distributed K-Means and Distributed Fuzzy C-Means Algorithms for Text Clustering

I Made Artha Agastya
2017 Communications in Science and Technology  
Text clustering has been developed in distributed system due to increasing data.  ...  DFCM-T and DKM-T can perform clustering of 1,400,000 text files in 16.18 and 9.74 minutes but the preprocessing times take hours to complete.  ...  Then Reduce task is performed when calculating average of every cluster member to gain new centroid. As well as DFCM algorithm, DKM algorithm is conducted in Mahout [20] .  ... 
doi:10.21924/cst.2.1.2017.46 fatcat:eowu56427jhuhkpbpe4jecgpn4

Research of Clustering Algorithms using Enhanced Feature Selection

2019 International Journal of Engineering and Advanced Technology  
This paper main focus is on document clustering, a sub task of text mining and to measure the performance of different clustering techniques.  ...  In Present situation, a huge quantity of data is recorded in variety of forms like text, image, video, and audio and is estimated to enhance in future.  ...  A sub task of text mining is document clustering, where documents are grouped into meaningful clusters such that the documents are similar to each other with in the cluster and dissimilar to other in different  ... 
doi:10.35940/ijeat.b5115.129219 fatcat:wjh3oghnevf7xmoajzdxpng2hu

Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN

Chuanzhen Li, Minqiao Liu, Juanjuan Cai, Yang Yu, Hui Wang
2020 IEEE Access  
of topic detection and tracking tasks.  ...  An improved density-based spatial clustering of application with noise (DBSCAN) clustering algorithm based on the time window is proposed to achieve accurate topic detection with the auxiliary advantage  ...  Specifically, the goal of the text clustering in the topic detection tasks is to merge the texts related to the same topic into a cluster.  ... 
doi:10.1109/access.2020.3047458 fatcat:hkv7ezerrrcohpszwpixcmrxoa

Growing Self Organising Map Based Exploratory Analysis Of Text Data

Sumith Matharage, Damminda Alahakoon
2014 Zenodo  
Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense.  ...  A comprehensive analysis of the GSOM's capabilities as a text clustering and visualisation tool has so far not been published.  ...  After the initial success of the SOM in text clustering tasks, a family of SOM based algorithms has been developed.  ... 
doi:10.5281/zenodo.1092386 fatcat:umwzysllercqth5onvjirkjcdq

A Graph-based Text Similarity Measure That Employs Named Entity Information

Leonidas Tsekouras, Iraklis Varlamis, George Giannakopoulos
2017 RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning  
Text comparison is an interesting though hard task, with many applications in Natural Language Processing.  ...  Using OpenCalais as a namedentity recognition service and the JIN-SECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means  ...  Introduction The development of a text comparison algorithm is a critical step in many Natural Language Processing and Text Mining tasks, such as text clustering, categorization and summarization.  ... 
doi:10.26615/978-954-452-049-6_098 dblp:conf/ranlp/TsekourasVG17 fatcat:ght5qvy6ivevneqlk4jx45lee4

Comparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents [article]

Manan Mohan Goyal, Neha Agrawal, Manoj Kumar Sarma, Nayan Jyoti Kalita
2015 arXiv   pre-print
Also a comparison is drawn based on accuracy of clustering between fuzzy and cosine similarity measure.  ...  The start time and end time parameters for formation of clusters are used in deciding optimum similarity measure.  ...  A good clustering of text requires effective feature selection and a proper choice of the algorithm for the task at hand. In [11] they have analysed Document Clustering on various datasets.  ... 
arXiv:1505.00168v1 fatcat:ttwtgydysrcvfhxigujhbyf7uu

Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts [article]

Milad Moradi, Matthias Samwald
2019 arXiv   pre-print
In recent years, summarizers that incorporate domain knowledge into the process of text summarization have outperformed generic methods, especially for summarization of biomedical texts.  ...  Although the summarizer does not use any sources of domain knowledge, it can capture the context of sentences more accurately than the comparison methods.  ...  We use an agglomerative hierarchical clustering algorithm in this step. The clustering algorithm starts by specifying the number of final clusters, i.e. the parameter K.  ... 
arXiv:1908.02286v2 fatcat:3pa62vuvi5hnrg5fkvvm7fsjmq

Performance Analysis for Crowdsourcing Context Submission using Hierarchical Clustering Algorithm and Classification

S. P.Jadhav, M. R. Patil
2014 International Journal of Computer Applications  
Hence proposed system uses hierarchical clustering algorithm with text mining methods and classification for relation submission to overcome the problems present in the existing system.  ...  Results obtained by existing system shows that k-means algorithm with text mining methods do not do the entire trick of evaluating submissions.  ...  5 Clustering Algorithm clustered submissions per contest 6 Table 2 : 2 Comparison between existing and proposed system Fig 3: Comparison of K-means and hierarchical clustering algorithm Clustering  ... 
doi:10.5120/16055-5275 fatcat:vf5fvfaiu5hbrp4fg3a5muvhba

On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment

Kunaphas Kongkitimanon, Boonsit Yimwadsana
2017 International Journal of Future Computer and Communication  
String matching algorithms plays an important role in many applications of computer science: in particular searching, retrieving and processing of data.  ...  Although these algorithm offers significant performance improvement over the BM algorithm, they were designed with the assumption of single core computer architecture which executes the algorithm in a  ...  ACKNOWLEDGMENT This research project was supported by Faculty of Information and Communication Technology, Mahidol University and the Integrative Computational Bioscience Center, Mahidol University, Bangkok  ... 
doi:10.18178/ijfcc.2017.6.1.479 fatcat:mllosw7dbjaplebzareuweksyu

Text Clustering Algorithms: A Review

Himanshu Suyal, Amit Panwar, Ajit Singh Negi
2014 International Journal of Computer Applications  
This paper briefly covers the various kinds of text clustering algorithm, present scenario of the text clustering algorithm, analysis and comparison of various aspects which contain sensitivity, stability  ...  This data is in unstructured format which makes it tedious to analyze it, so we need methods and algorithms which can be used with various types of text formats.  ...  Some of the text clustering algorithm uses the frequent data item [4] .Clustering can be used for various number of task.  ... 
doi:10.5120/16946-7075 fatcat:op3xwjavtraehgknfkf3hidcqy

A General Bio-inspired Method to Improve the Short-Text Clustering Task [chapter]

Diego Ingaramo, Marcelo Errecalde, Paolo Rosso
2010 Lecture Notes in Computer Science  
It takes as input the results obtained by arbitrary clustering algorithms and refines them in different stages.  ...  The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections.  ...  We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project for funding the work of the second and third authors.  ... 
doi:10.1007/978-3-642-12116-6_56 fatcat:2ijpmh3ay5dmzija6bznl6tbhm

Text Document Clustering Using Dimension Reduction Technique

A. Sudha Ramkumar, B. Poorna
2016 International Journal of Applied Engineering Research  
This paper presents an experimental analysis of the performance of the document clustering with the InfoGain technique and proves that this method significantly improves the performance in terms of Accuracy  ...  Text document clustering is used to group a set of documents based on the information it contains and to provide retrieval results when a user browses the internet.  ...  Experimental results with K-Means algorithm shown in this research had remarkable improvement in terms of accuracy of text clustering.  ... 
doi:10.37622/ijaer/11.7.2016.4770-4774 fatcat:crgalyqoyrg55b3z4iyhbz5la4

A New Agglomerative Hierarchical Clustering Algorithm Implementation based on the Map Reduce Framework

Hui Gao, Jun Jiang, Li She, Yan Fu
2010 International Journal of Digital Content Technology and its Applications  
Text clustering is one of the difficult and hot research fields in the text mining research.  ...  It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Reduce framework with agglomerative hierarchical clustering algorithm.  ...  Acknowledgment This work is partially supported by the National Natural Science Foundation of China (60973069, 90924011) and the Scientific Research Foundation for the Returned Overseas Chinese Scholars  ... 
doi:10.4156/jdcta.vol4.issue3.9 fatcat:vy2cb5hvyjezjc2tuu7ofn2vom

A Systematic study of Text Mining Techniques

Pravin Shinde, Sharvari Govilkar
2015 International Journal on Natural Language Computing  
Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of intermediate representations.  ...  The techniques that are used to analyse these intermediate representations such as clustering, distribution analysis, association rules and visualisation of the results.  ...  Clustering Algorithms Several different variants of an abstract clustering problem exist.  ... 
doi:10.5121/ijnlc.2015.4405 fatcat:pv6lawef7ngvzc7xwnf5aaj4ou

Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction [article]

Ayaan Haque, Viraaj Reddi, Tyler Giallanza
2021 arXiv   pre-print
Recent NLP research focuses on classifying, from a given piece of text, if an individual is suicidal or clinically healthy.  ...  Early detection of suicidal ideation in depressed individuals can allow for adequate medical attention and support, which in many cases is life-saving.  ...  For comparison of our method against other related tasks and methods, we build a dataset for binary classification of clinically healthy text vs suicidal text.  ... 
arXiv:2102.09427v2 fatcat:kmrismcepneddovz2njwraf6ly
« Previous Showing results 1 — 15 out of 119,043 results