篇名 | Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpora for Concatenation-based TTS |
---|---|
卷期 | 10:2 |
作者 | Lin, Cheng-yuan 、 Jang, Jyh-shing Roger 、 Chen, Kuan-ting |
頁次 | 145-166 |
關鍵字 | speech assessment methods phonetic alphabet 、 speech corpus 、 sequential forward selection 、 leave-one-out 、 k-nearest neighbor rule 、 speaker-adapted model 、 context-dependent hidden Markov model 、 THCI Core |
出刊日期 | 200506 |
Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this paper, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features
for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.