文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Building a Bracketed Corpus Using φ2 Statistics
卷期 2:2
作者 Lee, Yue-shiChen, Hsin-hsi
頁次 001-023
關鍵字 Bracketed Corpusφ2 StatisticsTreebankProbabilistic ChunkersTHCI Core
出刊日期 199708

中文摘要

英文摘要

Research based on treebanks is ongoing for many natural language applications. However, the work involved in building a large-scale treebank is laborious and time-consuming. Thus, speeding up the process of building a treebank has become an important task. This paper proposes two versions of probabilistic chunkers to aid the development of a bracketed corpus. The basic version partitions part-of-speech sequences into chunk sequences, which form a partially bracketed corpus. Applying
the chunking action recursively, the recursive version generates a fully bracketed corpus. Rather than using a treebank as a training corpus, a corpus, which is tagged with part-of-speech information only, is used. The experimental results show that the probabilistic chunker has a correct rate of more than 94% in producing a partially bracketed corpus and also gives very encouraging results in generating a fully bracketed corpus. These two versions of chunkers are simple but effective and can
also be applied to many natural language applications.

相關文獻