文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 A Study on Consistency Checking Method of Part-Of-Speech Tagging for Chinese Corpora
卷期 13:2
作者 Zhang, HuZheng, Jiaheng
頁次 157-169
關鍵字 Multi-Category WordsPart of Speech TaggingaConsistency CheckingChinese CorpusClassificationTHCI Core
出刊日期 200806

中文摘要

英文摘要

Ensuring consistency of Part-Of-Speech (POS) tagging plays an important role in the construction of high-quality Chinese corpora. After having analyzed the POS tagging of multi-category words in large-scale corpora, we propose a novel classification-based consistency checking method of POS tagging in this paper. Our method builds a vector model of the context of multi-category words along with using the k-NN algorithm to classify context vectors constructed from POS tagging sequences and to judge their consistency. These methods are evaluated on our 1.5M-word corpus. The experimental results indicate that the proposed method
is feasible and effective.

相關文獻