篇名 | A Study on Consistency Checking Method of Part-Of-Speech Tagging for Chinese Corpora |
---|---|
卷期 | 13:2 |
作者 | Zhang, Hu 、 Zheng, Jiaheng |
頁次 | 157-169 |
關鍵字 | Multi-Category Words 、 Part of Speech Tagginga 、 Consistency Checking 、 Chinese Corpus 、 Classification 、 THCI Core |
出刊日期 | 200806 |
Ensuring consistency of Part-Of-Speech (POS) tagging plays an important role in the construction of high-quality Chinese corpora. After having analyzed the POS tagging of multi-category words in large-scale corpora, we propose a novel classification-based consistency checking method of POS tagging in this paper. Our method builds a vector model of the context of multi-category words along with using the k-NN algorithm to classify context vectors constructed from POS tagging sequences and to judge their consistency. These methods are evaluated on our 1.5M-word corpus. The experimental results indicate that the proposed method
is feasible and effective.