文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Chinese Word Segmentation by Classification of Characters
卷期 10:3
作者 Goh, Chooi-lingAsahara, MasayukiMatsumoto, Yuji
頁次 381-396
關鍵字 Chineseunknown wordsegmentation ambiguityword segmentationmaximum matching algorithmsupport vector machinesTHCI Core
出刊日期 200509

中文摘要

英文摘要

During the process of Chinese word segmentation, two main problems occur:
segmentation ambiguities and unknown word occurrences. This paper describes a method to solve the segmentation problem. First, we use a dictionary-based approach to segment the text. We apply the Maximum Matching algorithm to segment the text forwards (FMM) and backwards (BMM). Based on the difference between FMM and BMM, and the context, we apply a classification method based on Support Vector Machines to re-assign the word boundaries. In so doing, we use the output of a dictionary-based approach, and then apply a machine-learning-based approach to solve the segmentation problem. Experimental results show that our model can achieve an F-measure of 99.0 for overall segmentation, given the condition that there are no unknown words in the text, and
an F-measure of 95.1 if unknown words exist.

相關文獻