文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 A Study on Chinese Spelling Check Using Confusion Sets and N-gram Statistics
卷期 20:1
作者 Lin,Chuan-JieChu,Wei-Cheng
頁次 023-048
關鍵字 Chinese Spelling CheckConfusion Set ExpansionGoogle Ngram Scoring FunctionTHCI Core
出刊日期 201506

中文摘要

英文摘要

This paper proposes an automatic method to build a Chinese spelling check system. Confusion sets were expanded by using two language resources, Shuowen Jiezi and the Four-Corner codes, which improved the coverages of the confusion sets. Nine scoring functions which utilize the frequency data in the Google Ngram Datasets were proposed, where the idea of smoothing was also adopted. Thresholds were also decided in an automatic way. The final system achieved far better than our baseline system in CSC 2013 Evaluation Task.

相關文獻