篇名 | Building a Bracketed Corpus Using φ2 Statistics |
---|---|
卷期 | 2:2 |
作者 | Lee, Yue-shi 、 Chen, Hsin-hsi |
頁次 | 001-023 |
關鍵字 | Bracketed Corpus 、 φ2 Statistics 、 Treebank 、 Probabilistic Chunkers 、 THCI Core |
出刊日期 | 199708 |
Research based on treebanks is ongoing for many natural language applications. However, the work involved in building a large-scale treebank is laborious and time-consuming. Thus, speeding up the process of building a treebank has become an important task. This paper proposes two versions of probabilistic chunkers to aid the development of a bracketed corpus. The basic version partitions part-of-speech sequences into chunk sequences, which form a partially bracketed corpus. Applying
the chunking action recursively, the recursive version generates a fully bracketed corpus. Rather than using a treebank as a training corpus, a corpus, which is tagged with part-of-speech information only, is used. The experimental results show that the probabilistic chunker has a correct rate of more than 94% in producing a partially bracketed corpus and also gives very encouraging results in generating a fully bracketed corpus. These two versions of chunkers are simple but effective and can
also be applied to many natural language applications.