HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

自然科學/資訊/科技

篇名	Chinese Word Segmentation by Classification of Characters
卷期	10:3
作者	Goh, Chooi-ling 、 Asahara, Masayuki 、 Matsumoto, Yuji
頁次	381-396
關鍵字	Chinese 、 unknown word 、 segmentation ambiguity 、 word segmentation 、 maximum matching algorithm 、 support vector machines 、 THCI Core
出刊日期	200509

During the process of Chinese word segmentation, two main problems occur:
segmentation ambiguities and unknown word occurrences. This paper describes a method to solve the segmentation problem. First, we use a dictionary-based approach to segment the text. We apply the Maximum Matching algorithm to segment the text forwards (FMM) and backwards (BMM). Based on the difference between FMM and BMM, and the context, we apply a classification method based on Support Vector Machines to re-assign the word boundaries. In so doing, we use the output of a dictionary-based approach, and then apply a machine-learning-based approach to solve the segmentation problem. Experimental results show that our model can achieve an F-measure of 99.0 for overall segmentation, given the condition that there are no unknown words in the text, and
an F-measure of 95.1 if unknown words exist.

本卷期文章目次

關鍵知識WIKI

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

中文摘要

英文摘要

本卷期文章目次

關鍵知識WIKI

相關文獻