篇名 | Chinese Word Segmentation as Character Tagging |
---|---|
卷期 | 8:1 |
作者 | Xue, Nianwen |
頁次 | 029-047 |
關鍵字 | Chinese word segmentation 、 character tagging 、 maximum entropy 、 supervised machine-learning 、 THCI Core |
出刊日期 | 200302 |
In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies, achieving precision and recall rates of 95.01% and 94.94% respectively, trained on a 237K-word training set.