HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

自然科學/資訊/科技

篇名	基於字元階層之語音合成用文脈訊息擷取
卷期	21:2
並列篇名	Character-Level Linguistic Features Extraction for Text-to-Speech System
作者	陳冠宏、廖書漢、廖元甫、王逸如
頁次	071-084
關鍵字	語音合成、文脈訊息、文字向量、遞迴類神經網路語言模型、 Speech Synthesis 、 Linguistic Features 、 Word2vec 、 RNNLM 、 THCI Core
出刊日期	201612

中文摘要

優良的語言文脈訊息是語音合成的關鍵部分，傳統的文脈訊息都是依賴於自然語言處理(Natural Language Processing，NLP)，即使用parser 分析文字。但是 parser 設計困難無法專門為語音合成設計；所以我們想直接以字元為處理單元建立一個end-to-end 的語音合成系統，在這想法下我們改用字元層級 (character-level)的word2vec 與遞迴類神經網路，直接將輸入字元序列轉換成隱藏特徵向量當做語言合成的文脈訊息。最後我們利用一中英夾雜語音合成系統測試此想法，語音合成的實驗的結果表明，我們提出的方式的確比傳統使用 parser 的方式有更好的性能。

英文摘要

High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we propose to replace the conventional NLP parser by a character embedding and a chacter-level recurrent neural network language model (RNNLM) module to directly convert input character sequences, character-by-character, into latent linguistic feature vectors. Experimental results on Chinese-English speech synthesis system showed that the proposed approach achieved comparable performance with transitional NLP parser-based methods.

本卷期文章目次

關鍵知識WIKI