文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Chinese Word Segmentation as Character Tagging
卷期 8:1
作者 Xue, Nianwen
頁次 029-047
關鍵字 Chinese word segmentationcharacter taggingmaximum entropysupervised machine-learningTHCI Core
出刊日期 200302

中文摘要

英文摘要

In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies, achieving precision and recall rates of 95.01% and 94.94% respectively, trained on a 237K-word training set.

相關文獻