篇名 | Customizable Segmentation of Morphologically Derived Words in Chinese |
---|---|
卷期 | 8:1 |
作者 | Wu, Andi |
頁次 | 001-027 |
關鍵字 | segmentation standards 、 word-internal structures 、 customizable systems 、 morphologically derive words 、 THCI Core |
出刊日期 | 200302 |
The output of Chinese word segmentation can vary according to different
linguistic definitions of words and different engineering requirements, and no can satisfy all linguists and all computer applications. Most of the disagreements in language processing come from the segmentation of
morphologically derived words (MDWs). This paper presents a system that can be conveniently customized to meet various user-defined standards in the segmentation of MDWs. In this system, all MDWs contain word trees where the root nodes correspond to maximal words and leaf nodes to minimal words. Each non-terminal node in the tree is associated with a resolution parameter which determines whether its daughters are to be displayed as a single word or separate words. Different outputs of segmentation can then be obtained from the different cuts of the tree, which are specified by the user through the different value combinations of those resolution parameters. We thus have a single system that can be customicustomized to meet different segmentation
specifications.