文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 A Maximum Entropy Approach for Semantic Language Modeling
卷期 11:1
作者 Chueh, Chuang-huaWang, Hsin-minChien, Jen-tzung
頁次 037-055
關鍵字 Language ModelingMaximum EntropyLatent Semantic AnalysisSpeech RecognitionTHCI Core
出刊日期 200603

中文摘要

英文摘要

The conventional n-gram language model exploits only the immediate context of historical words without exploring long-distance semantic information. In this paper, we present a new information source extracted from latent semantic analysis (LSA) and adopt the maximum entropy (ME) principle to integrate it into an n-gram language model. With the ME approach, each information source serves as a set of constraints, which should be satisfied to estimate a hybrid statistical language model with maximum randomness. For comparative study, we also carry out knowledge integration via linear interpolation (LI). In the experiments on the
TDT2 Chinese corpus, we find that the ME language model that combines the
features of trigram and semantic information achieves a 17.9% perplexity reduction compared to the conventional trigram language model, and it outperforms the LI language model. Furthermore, in evaluation on a Mandarin speech recognition task, the ME and LI language models reduce the character error rate by 16.9% and 8.5%, respectively, over the bigram language model.

相關文獻