文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 適合漸凍人使用之語音轉換系統初步研究
卷期 24:2
並列篇名 Deep Neural-Network Bandwidth Extension and Denoising Voice Conversion System for ALS Patients
作者 黃百弘廖元甫鄧廣豐Matúš PlevaDaniel Hládek
頁次 037-052
關鍵字 類神經網路Neural networkALSWaveNetTHCI Core
出刊日期 201912

中文摘要

漸凍人症(肌萎縮性脊隨側索硬化症,Amyotrophic lateral sclerosis, ALS)為一種神經退化性疾病,這種疾病目前還沒有治癒的方法,並會讓漸凍人慢慢失去說話能力,最終導致無法利用語音與人溝通,而失去自我認同。因此,我們需要為漸凍人建立適合其使用之語音溝通輔具(voice output communicationaids, VOCAs),尤其是讓其能具有個人化的合成語音,即病友發病前的聲音,以保持自我。但大部分在ALS 後期,已經不能講話的病友,都沒有事先妥善保存好個人的錄音,最多只能找出有少量大約20 分鐘的低品質語音,例如經過失真壓縮(MP3)、只保留低頻寬(8 kHz),或是具有強烈背景雜訊干擾等等,以致無法建構出適合ALS 病友使用的個人化語音合成系統。針對以上困難,本論文嘗試使用通用語音合成系統搭配語音轉換演算法,並在前級加上語音雜訊消除(speech denoising) , 後級輔以超展頻模組(speechsuper-resolution)。以能容忍有背景雜訊的錄音,並能將低頻寬的合成語音加上高頻成分(16 kHz)。以盡量能從低品質語音,重建出接近ALS 病友原音的高品質合成聲音。其中, speech denoising 使用WaveNet , speechsuper-resolution 則利用U-Net 架構。並先以20 小時的高品質(棚內錄音)教育電台語料庫,模擬出成對的高雜訊與乾淨語音語句,或是低頻寬與高頻寬語音,分別訓練WaveNet 與U-Net 模型,再用以處理病友的低品質語音錄音音檔。實驗結果顯示,訓練出來的WaveNet 與U-Net 模型,可以相當程度還原具雜訊或是低頻寬的教育電台語音檔。並能用來替ALS 病友重建出高品質的個人化合成聲音。

英文摘要

ALS (Amyotrophic lateral sclerosis) is a neurodegenerative disease. There is no cure for this disease, and it will make the ALS patients eventually lose their ability to use their own voice to communicate with others. Therefore, a personalized voice output communication aids (VOCAs) is essential for ALS patients to improve their daily life. However, most of the ALS patients have not properly reserved their personal recordings in the early stage of the disease. Usually, only few low-quality speech recordings, such as distortion compressed, narrow band (8 kHz), or noisy speech, are available for developing their own personalized VOCAs. In order to reconstruct high-quality synthetic sounds close to the original sound of ALS patients, voice conversion with speech denoising and bandwidth expansion capacities were proposed in this paper. Here, a front-end WaveNet- and a backend U-Net-based speech enhancement and super-resolution neural networks, respectively, were constructed and integrated with the backbone voice conversion system. The experimental results showed that the WaveNet and U-Net models can restore the noisy and narrowband speech, respectively. Therefore, it is promising to be applied to reconstruct high-quality personalized VOCAs for ALS patients.

相關文獻