文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 使用字典學習法於強健性語音辨識
卷期 21:2
並列篇名 The Use of Dictionary Learning Approach for Robustness Speech Recognition
作者 顏必成石敬弘劉士弘陳柏琳
頁次 035-054
關鍵字 強健性自動語音辨識調變頻譜稀疏編碼字典學習法RobustnessAutomatic Speech RecognitionModulation SpectrumSparse CodingDictionary LearningTHCI Core
出刊日期 201612

中文摘要

在有雜訊的環境下,自動語音辨識系統(Automatic Speech Recognition, ASR)的 效能往往會有明顯衰退的現象。本論文旨在研究語音強健性技術,希望能夠透 過語音特徵的調變頻譜(Modulation Spectrum)正規化以萃取出較具有強健性的 語音特徵。為此,我們使用K-奇異值分解(K-SVD)的字典學習法(Dictionary Learning)於分解調變頻譜的強度(Magnitude)成分,在最小化還原訊號誤差且在 其權重矩陣稀疏性的限制下,希望能獲取較具強健性的語音特徵。此外,因調 變頻譜強度成分皆為正值,所以我們提出非負K-SVD的方法來解決這個議題, 希望能增進自動語音辨識系統在抗噪上的效能。本論文的所有實驗皆於國際通 用的Aurora-2 連續數字資料庫進行;實驗結果顯示相較於僅使用梅爾倒頻譜 係數(Mel-Frequency Cepstral Coefficient, MFCC)之基礎實驗和其它常見的調變 頻譜分解方法,我們所提出的字典學習法與其改進方法皆能顯著地降低語音辨 識錯誤率。最後,我們也嘗試將所提出的字典學習方法與一些經典的強健性技 術結合,如:進階前端標準法(Advanced Front-End, AFE)、變異數正規化法 (Cepstral Mean and Variance Normalization, CMVN)、統計圖等化法(Histogram Equalization, HEQ),以驗證其實用性。

英文摘要

The performance of automatic speech recognition (ASR) often degrades dramatically in noisy environments. In this paper, we present a novel use of dictionary learning approach to normalizing the magnitude modulation spectra of speech features so as to retain more noise-resistant and important acoustic characteristics. To this end, we employ the K-SVD method to create sparse representations for a common set of basis vectors that span the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. In addition, taking into account the non-negativity property of amplitude modulation spectrum, we utilize the nonnegative K-SVD method, paired with the nonnegative sparse coding method, to capture more noise-robust features. All experiments were conducted on the Aurora-2 corpus and task. The empirical evidence shows that our methods can offer substantial improvements over the baseline NMF method. Finally, we also integrate the proposed variants of the K-SVD method with other well-known robustness methods like Advanced Front-End (AFE), Cepstral Mean and Variance Normalization (CMVN) and Histogram Equalization (HEQ) to further confirm their utility.

相關文獻