72,550 Hits in 6.8 sec

An Efficient Approach in Text Clustering Based on Frequent Itemsets

S.Murali Krishna, S.Durga Bhavani
2013 International Journal of Innovative Research in Computer and Communication Engineering  
This increasing number of textual data has led to the task of mining useful or interesting frequent itemsets (words/terms) from very large text databases and still it seems to be quite challenging.  ...  The use of such frequent itemsets for text clustering has received a great deal of attention in research community since the mined frequent itemsets reduce the dimensionality of the documents drastically  ...  [27] have introduced a frequent term based parallel clustering algorithm which could be employed to cluster short documents in very large text database.  ... 
doi:10.15680/ijircce.2013.0107018 fatcat:bcmlcbn36jexvfvc2am4ewvxwi

A Two-Stage Method for Scientific Papers Analysis

Damien Hanyurwimfura, Bo Liao, Emmanuel Masabo, Gaurav Bajpai
2014 Journal of Software  
Many available resources are in the form of unstructured text format of long text pages which require long time to read and analyze.  ...  format and grouped in different clusters according to their topics using a multi-word based clustering method.  ...  paper in a very short time reducing time and space.  ... 
doi:10.4304/jsw.9.10.2564-2573 fatcat:4ucslocdhjhpnpia4quksxxpgq

A Review on Knowledge Discovery using Text Classification Techniques in Text Mining

Chauhan ShrihariR, Amish Desai
2015 International Journal of Computer Applications  
With rapid growing of information increasing trends in people to extract knowledge from large text document.  ...  Knowledge discovery from textual database is a process of extracting interested or non retrival pattern from unstructured text document.  ...  Clustering is unsupervised method. Clustering technique used to group similar documents but it differs from categorization, in this documents are clustered.  ... 
doi:10.5120/19542-0784 fatcat:xwgei3efuvbtzktmu45sdt73hy

Document Image Coding and Clustering for Script Discrimination [article]

Darko Brodic, Alessia Amelio, Zoran N. Milivojevic, Milena Jevtic
2016 arXiv   pre-print
It defines feature vectors representing the script content of the document. A modified clustering approach employed on document feature vector groups documents written in the same script.  ...  The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values.  ...  Table 2 : 2 Clustering results on the second database. Standard deviation is given in parenthesis.  ... 
arXiv:1609.06492v1 fatcat:rcwo3rvpbzggpis6l6hclysvqm

Short-Text Clustering using Statistical Semantics

Sepideh Seifzadeh, Ahmed K. Farahat, Mohamed S. Kamel, Fakhri Karray
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
Short documents are typically represented by very sparse vectors, in the space of terms.  ...  In this case, traditional techniques for calculating text similarity results in measures which are very close to zero, since documents even the very similar ones have a very few or mostly no terms in common  ...  Text document clustering has been widely used to organize document databases and discover similarity and topics among documents.  ... 
doi:10.1145/2740908.2742474 dblp:conf/www/SeifzadehFKK15 fatcat:hmefetpr3vbyzhenfrusjlwtme

Document Classification Using Part of Speech in Text Mining

2015 International Journal of Science and Research (IJSR)  
Text mining methods are the fundamental and permitting tools for efficient organization, triangulation, retrieval and summarization of large document quantity.  ...  Text mining is a practice that is used to find beneficial in arrangement from the large amount of data sets.  ...  In text mining, every document is represented as a vector, whose dimension is almost the number of different keywords in it, which can be very large.  ... 
doi:10.21275/v4i12.nov152438 fatcat:uwor7e4nwrdbhd2oooz4xkth5m

A Fast Matching Method Based on Semantic Similarity for Short Texts [chapter]

Jiaming Xu, Pengcheng Liu, Gaowei Wu, Zhengya Sun, Bo Xu, Hongwei Hao
2013 Communications in Computer and Information Science  
As the emergence of various social media, short texts, such as weibos and instant messages, are very prevalent on today's websites.  ...  The major advantages of SSHash are that 1) SSHash alleviates the sparse problem in short texts, because we obtain the latent features from whole corpus regardless of document level; and 2) SSHash can accomplish  ...  In contrast, when the number of topics is large, the semantic features discovered are very specific.  ... 
doi:10.1007/978-3-642-41644-6_28 fatcat:wmppbbuccze6heprhgjmye4vqy

Text Mining Methods and Techniques

Sonali VijayGaikwad, Archana Chaugule, Pramod Patil
2014 International Journal of Computer Applications  
In many applications database stores information in text form so text mining is the one of the most resent area for research. To extract user required information is the challenging issue.  ...  In this survey paper we discuss such successful techniques and methods to give effectiveness over information retrieval in text mining.  ...  This technology can be very useful when dealing with large volumes of text.  ... 
doi:10.5120/14937-3507 fatcat:fisy27bdfza5bbrh5npjzsnxau

An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text

M. John Basha, K.P. Kaliyamurthie
2017 International Journal of Electrical and Computer Engineering (IJECE)  
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters.  ...  To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset.  ...  Compared to the sentence clustering, the clustering of the short texts are very difficult.  ... 
doi:10.11591/ijece.v7i1.pp551-558 fatcat:uczortzfc5hzjklygrfbplvoze

Online Newspaper Clustering in Aceh using the Agglomerative Hierarchical Clustering Method

Rizal Tjut Adek, Rozzy Kesuma Dinata, Ananda Ditha
2021 International Journal of Engineering Science and Information Technology  
The grouping of text documents is needed to classify news in online newspapers in Aceh based on the content contained in news articles.  ...  The study was conducted by taking online news text data on 10 online news websites in Aceh from July 2016 to March 2017 with 1000 randomly generated documents.  ...  In short, text mining aims to find patterns throughout a very large document collection [18] .  ... 
doi:10.52088/ijesty.v2i1.206 fatcat:ksinhh2vkzcznhbtuogwqfbecy

Timeline Generation After Summarization of Evolutionary Tweet Streams

Madhuri Jiwe
2019 IJARCCE  
Short-t messages in text form such as tweets are being created and shared at an unprecedented rate. Tweets, in their raw form while being informative, can also be enormous.  ...  Sumblr is way different than the traditional summarization methods which focuses on static and small scale data set, rather it is designed to deal with large scale, dynamic and fast arriving tweet streams  ...  thanks of gratitude to my guide who gave me the golden opportunity to do this wonderful project on the topic Timeline generation after summarization of evolutionary tweet streams which also helped me in  ... 
doi:10.17148/ijarcce.2019.8454 fatcat:2h2fbmm2ercexeh4lr4heecjya

Text Mining Using Metadata for Generation of Side Information

Shraddha S. Bhanuse, Shailesh D. Kamble, Sandeep M. Kakde
2016 Procedia Computer Science  
Text Mining is knowledge discovery process from large database to find out unknown patterns.  ...  In many metadata based text mining applications, side information also known as metadata which is associated with the text document.  ...  large text data in text mining area 15, 16, 17 .  ... 
doi:10.1016/j.procs.2016.02.061 fatcat:7rjgaacqkneetdbmfktpi2o6wa

Performance Evaluation of Cluster Based Algorithm used for Text Document Classification

2015 International Journal of Science and Research (IJSR)  
In this paper we develop a complete methodology for document classification and clustering.  ...  We use these findings in the construction of a Gaussian Mixture Document Clustering (GMDC) algorithm. This algorithm models the data as a sample from a Gaussian mixture.  ...  The query might be considered a very short document consisting of a few keywords, and the goal then is to find the documents in the collection that are most similar to the query document.  ... 
doi:10.21275/v5i5.7051602 fatcat:7astteuharayne2uyjnxljto4u


2000 Biocomputing 2001  
We present an algorithm for large-scale document clustering of biological text, obtained from Medline abstracts.  ...  Experiments show that the resulting document clusters are meaningful as assessed by cluster-specific terms.  ...  The feature/instance ratio may have to be reduced for very large-scale experiments.  ... 
doi:10.1142/9789814447362_0038 fatcat:kyp762w4nzctrmapjhrl6hozsq

Natural language processing methods for knowledge management—Applying document clustering for fast search and grouping of engineering documents

Ivar Örn Arnarsson, Otto Frost, Emil Gustavsson, Mats Jirstrand, Johan Malmqvist
2021 Concurrent Engineering - Research and Applications  
These documents are rich in unstructured data (e.g. free text).  ...  In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering  ...  It is often time consuming and difficult to find related documents in a list generated from a search into a large database.  ... 
doi:10.1177/1063293x20982973 fatcat:7cxtci2v2na5jo5iemp6u6zryu
« Previous Showing results 1 — 15 out of 72,550 results