篇名 | Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification |
---|---|
卷期 | 12:3 |
作者 | Zheng, Nengheng 、 Lee, Tan 、 Wang, Ning 、 Ching, P.-C. |
頁次 | 273-290 |
關鍵字 | Speaker Identification 、 Vocal Tract Feature 、 Vocal Source Feature 、 Information Fusion 、 Confidence Measure 、 THCI Core |
出刊日期 | 200709 |
This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral features aim at characterizing the formant structure of the vocal tract system. This study
proposes a new feature set, named the wavelet octave coefficients of residues (WOCOR), to characterize the vocal source excitation signal. WOCOR is derived by wavelet transformation of the linear predictive (LP) residual signal and is capable of capturing the spectro-temporal properties of vocal source excitation. WOCOR and MFCC contain complementary information for speaker recognition since they characterize two physiologically distinct components of speech production. The complementary contributions of MFCC and WOCOR in speaker identification are investigated. A confidence measure based score-level fusion technique is proposed to take full advantage of these two complementary features
for speaker identification. Experiments show that an identification system using both MFCC and WOCOR significantly outperforms one using MFCC only. In comparison with the identification error rate of 6.8% obtained with MFCC-based system, an error rate of 4.1% is obtained with the proposed confidence measure based integrating system.