文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 基於字元階層之語音合成用文脈訊息擷取
卷期 21:2
並列篇名 Character-Level Linguistic Features Extraction for Text-to-Speech System
作者 陳冠宏廖書漢廖元甫王逸如
頁次 071-084
關鍵字 語音合成文脈訊息文字向量遞迴類神經網路語言模型Speech SynthesisLinguistic FeaturesWord2vecRNNLMTHCI Core
出刊日期 201612

中文摘要

優良的語言文脈訊息是語音合成的關鍵部分,傳統的文脈訊息都是依賴於自然 語言處理(Natural Language Processing,NLP),即使用parser 分析文字。但是 parser 設計困難無法專門為語音合成設計;所以我們想直接以字元為處理單元 建立一個end-to-end 的語音合成系統, 在這想法下我們改用字元層級 (character-level)的word2vec 與遞迴類神經網路,直接將輸入字元序列轉換成隱 藏特徵向量當做語言合成的文脈訊息。最後我們利用一中英夾雜語音合成系統 測試此想法,語音合成的實驗的結果表明,我們提出的方式的確比傳統使用 parser 的方式有更好的性能。

英文摘要

High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we propose to replace the conventional NLP parser by a character embedding and a chacter-level recurrent neural network language model (RNNLM) module to directly convert input character sequences, character-by-character, into latent linguistic feature vectors. Experimental results on Chinese-English speech synthesis system showed that the proposed approach achieved comparable performance with transitional NLP parser-based methods.

相關文獻