Multiple Document Summarization Using Principal Component Analysis Incorporating Semantic Vector Space Model

Om Vikas, Akhil K. Meshram, Girraj Meena, Amit Gupta
2008 International Journal of Computational Linguistics and Chinese Language Processing  
Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the set of documents as Semantic Vector Space Model (SVSM) and applying Principal Component Analysis (PCA) to extract topic features. Pure Statistical VSM assumes terms to be independent of each other and may result in inconsistent results. Vector space is
more » ... semantically by modifying the weight of the word vector governed by Appearance and Disappearance (Action class) words. The knowledge base for Action words is maintained by classifying the words as Appearance or Disappearance with the help of Wordnet. The weights of the action words are modified in accordance with the Object list prepared by the collection of nouns corresponding to the action words. Summary thus generated provides more informative content as semantics of natural language has been taken into consideration.
dblp:journals/ijclclp/VikasMMG08 fatcat:mrg2zb5eobhufobt7ajtis6zvq