文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text
卷期 17:3
作者 Lin, Chuan-jieZhan, Jia-chengChen, Yen-hengPao, Chien-wei
頁次 087-108
關鍵字 Semantic Chinese Word SegmentationJapanese Name IdentificationCharacter VariantsTHCI Core
出刊日期 201209

中文摘要

英文摘要

This paper proposes an approach to identify word candidates that are not Traditional Chinese, including Japanese names (written in Japanese Kanji or Traditional Chinese characters) and word variants, when doing word segmentation on Traditional Chinese text. When handling personal names, a probability model concerning formats of names is introduced. We also propose a method to map Japanese Kanji into the corresponding Traditional Chinese characters. The same method can also be used to detect words written in character variants. After integrating generation rules for various types of special words, as well as their probability models, the F-measure of our word segmentation system rises from 94.16% to 96.06%. Another experiment shows that 83.18% of the 862 Japanese names in a set of 109 human-annotated documents can be successfully detected.

相關文獻