文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 探究端對端混合模型架構於華語語音辨識
卷期 24:1
並列篇名 An Investigation of Hybrid CTC-Attention Modeling in Mandarin Speech Recognition
作者 張修瑞趙偉成羅天宏陳柏琳
頁次 039-050
關鍵字 CTCAttention端對端中文語音辨識短句辨識Attention-based Encoder-DecoderEnd-to-End Mandarin Chinese Speech RecognitionShort Utterance RecognitionTHCI Core
出刊日期 201906

中文摘要

近年來端對端(End-to-End)語音辨識的出現,簡化了許多傳統語音辨識的繁複流程。端對端語音辨識中, 最主要的模型架構分別為連結時序分類(Connectionist Temporal Classification, CTC)與注意力模型(Attention Model)。本論文嘗試結合上述兩種模型架構(即CTC-Attention 混合模型)於華語會議語音辨識之使用,以期能進一步提升語音辨識的效能。為此,我們分析模型結合時混合權重調整的影響,並進一步探究CTC-Attention 混合模型對於短句的辨識效果。在中文會議語料的實驗結果顯示,相較於傳統語音辨識的TDNN-LFMMI模型,CTC-Attention 混合模型在語句較短時,可具有較好的一般化能力(Generalization)。

英文摘要

The recent emergence of end-to-end automatic speech recognition (ASR) frameworks has streamlined the complicated modeling procedures of ASR systems in contrast to the conventional deep neural network-hidden Markov (DNN-HMM) ASR systems. Among the most popular end-to-end ASR approaches are the connectionist temporal classification (CTC) and the attention-based encoder-decoder model (Attention Model). In this paper, we explore the utility of combining CTC and the attention model in an attempt to yield better ASR performance. we also analyze the impact of the combination weight and the performance of the resulting CTC-Attention hybrid system on recognizing short utterances. Experiments on a Mandarin Chinese meeting corpus demonstrate that the CTC-Attention hybrid system delivers better performance on short utterance recognition in comparison to one of the state-of-the-art DNN-HMM settings, namely, the so-called TDNN-LFMMI system.

相關文獻