文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 基於深度聲學模型其狀態精確度最大化之強健語音特徵擷取的初步研究
卷期 25:2
並列篇名 The Preliminary Study of Robust Speech Feature Extraction based on Maximizing the Accuracy of States in Deep Acoustic Models
作者 張立家洪志偉
頁次 085-098
關鍵字 雜訊強健性之語音特徵語音辨識深度學習Noise-robust Speech FeatureSpeech RecognitionDeep LearningTHCI Core
出刊日期 202012

中文摘要

在本研究中,我們提出一種新穎的強健性語音特徵擷取技術,以增進雜訊干擾環境下的語音辨識效能。此新技術利用語音辨識系統中後端的原聲學模型所提供的資訊,在不重新訓練聲學模型的前提下,藉由深度類神經網路架構,學習得到最大化聲學模型狀態之精確度對應的語音特徵,進而使此語音特徵擁有對雜訊的強健性,相較於其他改善聲學模型以達到雜訊強健性的技術,本研究所提出的新技術具有計算量小且訓練快的優點。在初步實驗中,我們使用了TIMIT此中型語料庫來評估,實驗結果顯示所提之新語音特徵擷取法,相對於基礎實驗,能有效地降低各種雜訊種類與雜訊程度之環境下語音的音素錯誤率,凸顯此方法的效能及發展價值。

英文摘要

In this study, we focus on developing a novel speech feature extraction technique to achieve noise-robust speech recognition, which employs the information from the backend acoustic models. Without further retraining and adapting the backend acoustic models, we use deep neural networks to learn the front-end acoustic speech feature representation that can achieve the maximum state accuracy obtained from the original acoustic models. Compared with the robustness methods that retrain or adapt acoustic models, the presented method exhibits the advantages of lower computational complexity and faster training. In the preliminary evaluation experiments conducted with the median-vocabulary TIMIT database and task, we show that the newly presented method achieves lower word error rates in recognition under various noise types and levels compared with the baseline results. Therefore, this method is quite promising and worth developing further.

相關文獻