文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 語音辨識使用統計圖等化方法
卷期 17:4
並列篇名 Speech Recognition Leveraging Histogram Equalization Methods
作者 謝欣汝洪志偉陳柏琳
頁次 069-084
關鍵字 語音辨識雜訊強健性統計圖等化法特徵文脈的統計Speech RecognitionNoise RobustnessHistogram EqualizationFeature Contextual StatisticsTHCI Core
出刊日期 201212

中文摘要

統計圖等化法(Histogram Equalization, HEQ)是一種概念簡單且有效的語音特
徵處理技術,近年來被廣泛地研究與應用於強健性語音辨識的領域。在本論文
中,我們延續統計圖等化法的研究,提出一系列使用語音特徵的空間-時間之
文脈統計資訊 (Spatial-Temporal Contextual Statistics)的語音特徵強健方法;其
作法是在語音之倒頻譜特徵上,利用一個簡易的差分(Differencing)和平均
(Averaging)的處理方式,來得到語音特徵之文脈統計資訊後予以正規化並結合。
這些新方法的作法有別於傳統之個別維度獨立正規化(Dimension-Wise)的統計
圖等化法,進一步地正規化不同空間與時間之間的特徵分布資訊,因此可以降
低不同聲學環境所產生的偏差,並且嘗試消除傳統之統計圖等化法無法補償的
問題,亦即隨機性雜訊(Random Noise)對語音所產生的影響。本論文所有的語
音辨識實驗皆是作用於國際通用的連續語音語料庫 Aurora-2 上;實驗結果顯
示,我們所提出之方法相較於許多著名的特徵強化法,皆有不錯的效果。

英文摘要

Histogram equalization (HEQ) of speech features has received considerable
attention in the field of robust speech recognition due to its simplicity and excellent
performance. This paper is a continuation of this general line of research,
presenting a novel HEQ-based feature normalization framework which takes
advantage of joint equalization of spatial-temporal contextual statistics of speech
features. In doing so, we explore the use of simple differencing and averaging
operations to capture the contextual statistics of feature vector components for
speech feature normalization. All experiments are conducted on the Aurora-2
database and task. Experimental results show that for clean-condition training, the
methods instantiated from this framework achieve considerable word error rate
reductions over the baseline system, which are indeed quite comparable to other
conventional methods.

相關文獻