文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 NSYSU+CHT團隊於2020遠場語者驗證比賽之語者驗證系統
卷期 25:2
並列篇名 NSYSU+CHT Speaker Verification System for Far-Field Speaker Verification Challenge 2020
作者 張育嘉陳嘉平蕭善文詹博丞呂仲理
頁次 055-068
關鍵字 遠場語者驗證時延神經網路卷積神經網路時延殘差神經網路GhostVLADSpeaker VerificationTDNNCNNTDResNetTHCI Core
出刊日期 202012

中文摘要

在本論文中,我們描述了NSYSU+CHT 團隊在2020 遠場語者驗證比賽 (2020Far-field Speaker Verification Challenge, FFSVC 2020) 中所實作的系統。單一系統採用基於嵌入的語者識別系統。該系統的前端特徵提取器是結合了時延神經網路,與卷積神經網路模組兩者的優點,稱為時延殘差神經網路的架構。在池化層,我們實驗了不同方式:統計池化層和 GhostVLAD。而後端的評分器則採用機率線性判別分析,我們訓練跟調適機率線性判別分析用以不同系統的融合。我們分別參加了FFSVC 2020 採單一麥克風陣列資料的文本相關(任務一)與文本無關(任務二)的語者驗證任務。我們提出的系統在任務一上取得minDCF 0.7703,EER 9.94%,在任務二上則是minDCF 0.8762,EER 10.31%。

英文摘要

In this paper, we describe the system Team NSYSU+CHT has implemented for the 2020 Far-field Speaker Verification Challenge (FFSVC 2020). The single systems are embedding-based neural speaker recognition systems. The front-end feature extractor is a neural network architecture based on TDNN and CNN modules, called TDResNet, which combines the advantages of both TDNN and CNN. In the pooling layer, we experimented with different methods such as statistics pooling and GhostVLAD. The back-end is a PLDA scorer. Here we evaluate PLDA training/adaptation and use it for system fusion. We participate in the text-dependent(Task 1) and text-independent(Task 2) speaker verification tasks on single microphone array data of FFSVC 2020. The best performance we have achieved with the proposed methods are minDCF 0.7703, EER 9.94% on Task 1, and minDCF 0.8762, EER 10.31% on Task 2.

相關文獻