文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Latent Semantic Language Modeling and Smoothing
卷期 9:2
作者 Chien, Jen-tzungWu, Meng-sungPeng, Hua-jui
頁次 029-044
關鍵字 language modelingand latent semantic analysisspeech recognitionparameter smoothingTHCI Core
出刊日期 200408

中文摘要

英文摘要

Language modeling plays a critical role for automatic speech recognition.
Typically, the n-gram language models suffer from the lack of a good
representation of historical words and an inability to estimate unseen parameters due to insufficient training data. In this study, we explore the application of latent semantic information (LSI) to language modeling and parameter smoothing. Our approach adopts latent semantic analysis to transform all words and documents into a common semantic space. The word-to-word, word-to-document and document-to-document relations are, accordingly, exploited for language modeling and smoothing. For language modeling, we present a new representation of historical words based on retrieval of the most relevant document. We also develop a novel parameter smoothing method, where the language models of seen and unseen words are estimated by interpolating the k nearest seen words in the
training corpus. The interpolation coefficients are determined according to the closeness of words in the semantic space. As shown by experiments, the proposed modeling and smoothing methods can significantly reduce the perplexity of language models with moderate computational cost.

相關文獻