文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 可變速中文文字轉語音系統
卷期 17:1
並列篇名 Variable Speech Rate Mandarin Chinese Text-to-Speech System
作者 江振宇黃啟全王逸如余秀敏陳信宏
頁次 027-041
關鍵字 文字轉語音系統中文韻律語速停頓預估Text-to-Speech SystemMandarin ProsodySpeech RateBreak PredictionTHCI Core
出刊日期 201203

中文摘要

本論文描述以隱藏式馬可夫模型為基礎發展之「可變速中文文字轉語音系統」,訓練語料為三種不同語速之平行語料,分別對三種語速訓練文脈相關隱藏式馬可夫模型,並利用給予不同語速模型權重值來內插調整語速。另外,從語料庫觀察發現到慢速語音之靜音停頓較多而快速語音較少,傳統以標點符號位置決定靜音停頓的簡單方法,在用於可變速語音合成是不適當的,因此本研究加入預估靜音停頓之機制,對於不同語速分別訓練靜音停頓預估決策樹,再利用調整權重值內插不同語速停頓決策樹機率的方法,達到不同語速下靜音停頓的預估。為了評估本系統之效能,我們對系統進行客觀測試及主觀測試,在客觀測試中,評量靜音停頓預估之效能及量測合成語音和目標語音的誤差值;在主觀測試中,特別針對隱藏式馬可夫模型權重、靜音停頓決策樹權重以上兩組權重值的組合比較合成語音自然度,實驗結果顯示兩組權重值必須匹配才可合成出較自然的語音。期望以本論文提出方法建構之系統,較傳統單一語速之文字轉語音系統,更適合用於人機互動之中。

英文摘要

This paper presents an Hidden Markov Model (HMM)-based variable speech rate Mandarin Chinese text-to-speech (TTS) system. In this system, parameters of spectrum, fundametal frequency and state duration are generated by a context dependent HMM (CDHMM) whose model parameters are linear-interpolated from those of three CDHMMs trained by corpora in three different speech rates (SRs), i.e. fast, medium and slow. In addition, three decision tree (DT)-based pause break predictors trained by using the three SR corpora are used to interpolate the probabilities for inserting pause breaks. The performance of the proposed TTS system were evaluated by several objective and subjective tests. Experimental
results suggested that coherence between interpolation weights for CDHMMs and DT-based pasue predictors is crutial for naturalness of the synthesis speech in variable SR. We believe that the proposed variable speech rate Mandarin Chinese TTS system is more suitable than conventional fixed SR TTS systems for applications of human-machine interaction.

相關文獻