文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Multiple Document Summarization Using Principal Component Analysis Incorporating Semantic Vector Space Model
卷期 13:2
作者 Vikas, OmMeshram, Akhil KMeena, GirrajGupta, Amit
頁次 141-156
關鍵字 Principal Component Analysis WordnetTopic FeatureSummarizationSemantic Vector Space Model THCI Core
出刊日期 200806

中文摘要

英文摘要

Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the set of documents as Semantic Vector Space Model (SVSM) and applying Principal Component Analysis (PCA) to extract topic features. Pure Statistical VSM assumes terms to be independent of each other and may result in inconsistent results. Vector space is enhanced semantically by modifying the weight of the word vector governed by Appearance and Disappearance (Action class) words. The knowledge base for Action words is maintained by classifying the words as Appearance or Disappearance with the help of Wordnet. The weights
of the action words are modified in accordance with the Object list prepared by the collection of nouns corresponding to the action words. Summary thus generated provides more informative content as semantics of natural language has been taken into consideration.

相關文獻