文章詳目資料

電腦與通訊

  • 加入收藏
  • 下載文章
篇名 適用於個人化車載資訊播報系統之語者調適語音合成技術
卷期 147
並列篇名 Speaker Adaptive Speech Synthesis Technology for Personalized In-vehicle Information Broadcasting System
作者 林政源林政賢黃柏凱郭志忠
頁次 091-098
關鍵字 基於隱藏式馬可夫模型之語音合成(HMM-based Speech Synthesis)文字轉語音(Text To Speech)語者調適(Speaker Adaptation)個人化車載資訊播報系統(Personalized In-vehicle Information Broadcasting System)
出刊日期 201210

中文摘要

將語音合成技術推廣應用於個人化車載資訊播報系統,一個主要的發展重點是,如何有效率 的收集錄音語料進行語者調適。在本文中,我們提出了兩種基於貪婪演算法做挑選句子的方式。 其一是音素涵蓋法,另一個則是模型涵蓋法。前者考慮調適語料的音節資訊,而後者考量出現在 平均語言模型中的Mel-cepstral和logFo模型的次數。為了驗證方法的可行性,我們在主觀和客 觀的評量上和隨機挑選法做比較。客觀評量的實驗結果指出,用模型涵蓋法所合成的語音有較少 的Mel-cepstral失真度以及較低的log Fo均方根誤差。主觀評量的實驗結果指出,音素及模型涵 蓋方式明顯優於隨機挑選法。

英文摘要

The main focus of personalized speech synthesis technology applied to the in-vehicle information broadcasting system is to how to efficiently collect the recording data for the use of speaker adaptation. In this paper, we present two sentence selection approaches based on the greedy algorithm, one is the phone coverage based and the other is model coverage based. The former considers the phonetic information of adaptation data and the latter focuses on the occurrences of Mel-cepstral and log Fo models in decision trees of the average voice model. To verify the feasibility of the proposed methods, we compare the results with the random selection in objective and subjective evaluations. The objective evaluation results show that the model coverage based approach can generate synthetic speech with fewer Mel-cepstral distortions and lower RMSE logFo. The subjective evaluation results indicate that the phone/model coverage approaches are certainly beneficial as compared with random selection.

本卷期文章目次

相關文獻