Supervised and Unsupervised Document Classification-A survey

Deepshikha Kalita
unpublished
All users want to have their documents in a more systematic and secured way. Assume a situation. We have huge collections of books. It may contain novels, storybooks, and fictions, books on Culture and Heritage, History, and Geography etc. Suppose someone enquires of a book on History. It is quite difficult for us to find it in the midst of all books. If we manually go for searching, it may take several hours, may be days also. If we can categorize the books in different categories with respect
more » ... to some criteria, it would have been more efficient to search and more secured too. A major problem faced by institutions, organizations, and businesses nowadays is that of information overload. Sorting out useful documents from collection that are not of interest challenges the ingenuity and resources of both individuals and organizations. Keyword search engines can be helpful but there are some limitations in this. Keyword searches don't discriminate by context. On the other hand if we manually go for classifications and clustering, it is not feasible for large volumes of documents. So we need to develop an automatic classifier to manage the documents in a more secure way. By classifying a document we can establish the required level of protection with less manual effort. Documents' classification and clustering are two very important techniques for achieving this goal. In this survey report we will discuss various methods of documents' classification and its various approaches used till date. We will also present a review of comparisons of the existing methods along with their advantages and disadvantages.
fatcat:veilo2p46fe4rpjnpkohcmccli