篇名 | On the Use of Speech Recognition Techniques to Identify Bird Species |
---|---|
卷期 | 19:1 |
作者 | Wei-Ho Tsai 、 Yu-Zhi Xue |
頁次 | 055-067 |
關鍵字 | Bird Species Identification 、 Bigram Model 、 Gaussian Mixture Model 、 Pitch 、 Timbre 、 THCI Core |
出刊日期 | 201403 |
Wild bird watching has become a popular leisure activity in recent years. Very often, people can see birds or hear their sounds, but have no idea what kind of bird species they are seeing. To help people learn to identify bird species from their sounds, we apply speech recognition techniques to build an automatic bird sound identification system. In this system, two acoustic cues are used for analysis, timbre and pitch. In the timbre-based analysis, Mel-Frequency Cepstral Coefficients (MFCCs) are used to characterize the bird sound. Then, we use Gaussian Mixture Models to represent the MFCCs as a set of parameters. In the pitch-based analysis, we convert bird sounds from their waveform representations into a sequence of MIDI notes. Then, Bigram models are used to capture the dynamic change information of the notes. We chose the top ten common bird species in the Taipei urban area to examine our system. Experiments conducted using audio data collected from commercial CDs and websites show that the timbre-based, pitch-based, and the combination thereof systems achieve 71.1%, 72.1%, and 75.0% accuracy of bird sound identification, respectively.