文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Construction and Automatization of a Minnan Child Speech Corpus with some Research Findings
卷期 12:4
作者 Tsays, Jane S.
頁次 411-441
關鍵字 Minnan,Automatic Word SegmentationCHILDESChild LanguageSpeech CorpusTaiwaneseTaiwan Southern MinTHCI Core
出刊日期 200712

中文摘要

英文摘要

Taiwanese Child Language Corpus (TAICORP) is a corpus based on spontaneous
conversations between young children and their adult caretakers in Minnan
(Taiwan Southern Min) speaking families in Chiayi County, Taiwan. This corpus is special in several ways: (1) It is a Minnan corpus; (2) It is a speech-based corpus; (3) It is a corpus of a language that does not yet have a conventionalized orthography; (4) It is a collection of longitudinal child language data; (5) It is one of the largest child corpora in the world with about two million syllables in 497,426
lines (utterances) based on about 330 hours of recordings. Regarding the format, TAICORP adopted the Child Language Data Exchange System (CHILDES)
[MacWhinney and Snow 1985; MacWhinney 1995] for transcribing and coding the recordings into machine-readable text. The goals of this paper are to introduce the construction of this speech-based corpus and at the same time to discuss some problems and challenges encountered. The development of an automatic word segmentation program with a spell-checker is also discussed. Finally, some findings in syllable distribution are reported.

相關文獻