A Survey on Automatic Semantic Subject Indexing of Documents using Big Data Analytics
International Journal for Research in Applied Science and Engineering Technology
The automatic subject indexing of documents is prevailing issue due to the increase in quantity and diversity of digital documents available to end users. So there is a need for effective and efficient indexing and retrieval techniques. Indexing is a crucial aspect that allows the documents to be located quickly. Instead of full-text indexing on documents, the metadata such as title of publication and abstract may be considered for performance and accuracy. To retrieve the documents which are
... ntextually related by annotating the massive collection with only the title and abstract, whereas individual words provide unreliable evidence about the conceptual topic or meaning of a document. Hence, the available approaches cannot meet several challenges of data in terms of processing. This results in inefficient query results. There is a need for the design of indexing strategies that can support. There are various indexing strategies which are utilized for solving Big Data management issues, and can also serve as a base for the design of more efficient indexing strategies. The aim is to explore methods of indexing and retrieving the documents based on the different query search types, by utilizing some of the subject indexing strategy for Big Data manageability by identifying some of the challenges of existing strategies. The existing strategies like, Vector Space Models, Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Logistic Regression, Linear Discriminant Analysis, Naïve Bayes and Logistic Regression which have their own challenges. This paper will describe about some of the Automatic subject-indexing approach's applied to retrieve subject specific Document and presents the characteristics and challenges involved.