篇名 | The Design and Construction of the PolyU Shallow Treebank |
---|---|
卷期 | 10:3 |
作者 | Xu, Ruifeng 、 Lu, Qin 、 Li, Yin 、 Li, Wanyin |
頁次 | 397-415 |
關鍵字 | Shallow Treebank 、 Natural Language Processing 、 Corpus Annotation 、 Shallow Parsing 、 THCI Core |
出刊日期 | 200509 |
This paper presents the design and construction of the PolyU Treebank, a manually annotated Chinese shallow treebank. The PolyU Treebank is based on shallow annotation where only partial syntactical structures within sentences are annotated. Guided by the Phrase-Standard Grammar proposed by Peking University, the PolyU Treebank has been designed and constructed to provide a large amount of annotated data containing shallow syntactical information and limited semantic information for use in natural language processing (NLP) research. This paper describes the relevant design principles, annotation guidelines, and implementation
issues, including the achievement of high quality annotation through the use of well-designed annotation workflow and effective post-annotation checking tools. Currently, the PolyU Treebank consists of a one-million-word annotated corpus and has been used in a number of NLP research projects with promising results.