文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統
卷期 22:2
並列篇名 A Replay Spoofing Detection System Based on Discriminative Autoencoders
作者 吳家隆許祥平呂淯鼎曹昱李鴻欣王新民
頁次 063-072
關鍵字 語者辨識語者辨識攻擊回放攻擊偵測鑑別式自編碼解碼器深 度類神經網路Speaker VerificationSpeakser Verification AttackSpoofing AttackDiscriminative AutoencoderDeep Neural NetworkTHCI Core
出刊日期 201712

中文摘要

在此論文中,我們提出了一個基於鑑別式自編碼解碼器的神經網路模型,對語 者辨識系統的錄音回放攻擊進行自動偵測,也就是判斷語者辨識系統所收到的 音訊內容是屬於真實的人聲或是由錄音機所回放出來的人聲。在語者辨識領域 中,以人為的聲音造假對語者辨識系統進行的攻擊稱之為欺騙攻擊(Spoofing Attack)。有鑑於深度類神經網路模型已被廣泛應用在語音處理相關問題,我們 期望能夠應用相關模型在此類問題上。在所提出的鑑別式自編碼解碼器模型 中,我們利用模型的中間層來達到特徵抽取的目的,並且提出新的損失函數, 使得中間層的特徵將依照資料的標記結果做分群,因此新的特徵將具有能鑑別 真偽人聲的資訊,最後再利用餘弦相似度來計算所抽取的特徵與真實的人聲相 近與否,得到偵測的結果。我們採用2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge(ASVspoof-2017)所提供的資料庫進行測試,所提出的系統在開發數據集上得到了很好的成效,與官方所提供的測試 方法相比,其準確度約有42 %的相對進步幅度。

英文摘要

In this paper, we propose a discriminative autoencoder (DcAE) neural network model to the replay spoofing detection task, where the system has to tell whether the given utterance comes directly from the mouth of a speaker or indirectly through a playback. The proposed DcAE model focuses on the midmost (code) layer, where a speech utterance is factorized into distinct components with respect to its true label (genuine or spoofed) and meta data (speaker, playback, and recording devices, etc.). Moreover, the concept of modified hinge loss is introduced to formulate the cost function of the DcAE model, which ensures that the utterances with the same speech type or meta information will share similar identity codes (i-codes) and higher similarity score computed by their i-codes. Tested on the development set provided by ASVspoof 2017, our system achieved a much better result, up to 42% relative improvement in the equal error rate (EER) over the official baseline based on the standard GMM classifier.

相關文獻