Improved k-means Clustering for Document Categorization

Amandeep Kaur, Tarun Kumar
International Research Journal of Engineering and Technology   unpublished
Document categorization is used for sort the useful document and classifies the document by content. Document categorization is document classification. It is an approach of machine learning in the form of Natural Language Processing (NLP). Our goal is to assign one or more classes or categories to a document, which makes it easier to sort and manage. In our research dataset is used and read the documents. The special symbols, stemming, and stop words are removed. Lowercase conversion performed
more » ... to reduce the time. The occurrence of repeated words also measured. The tf-idf also calculated for vector space model. We also predict the centers and finding out the nearest neighbor. For the evaluation of performance precision, recall and f-measure also calculated.
fatcat:oytkcecldrbf7ciqiqjdv4673e