Robust Speech Recognition with Dynamic Time Warping and Nonlinear Median Filter

Yuxin Zhang; Yoshikazu Miyanaga; Constantin Siriteanu

doi:10.2299/jsp.16.147

抄録

In this paper we propose a new robust automatic speech recognition (ASR) method using dynamic time warping (DTW) and a nonlinear median filter (NMF). Although conventional DTW is fast and requires no training, its recognition accuracy is limited. The recognition accuracy of conventional DTW algorithms is lower than that of algorithms using the hidden Markov model (HMM) approach under all noisy conditions. Therefore, in order to improve ASR accuracy, in this paper we first employ the short-time energy method to remove nonspeech segments. Then, we deploy a noise-reduction method. Finally, unlike conventional DTW algorithms, which search for the reference word with minimum distance from the unknown speech waveform, we use an NMF and search for the reference word with minimum median distance from the unknown speech waveform. We find that the recognition accuracy of conventional DTW implementations can be improved substantially by the NMF. Our approach yields DTW recognition accuracy similar to that of the HMM techniques in the presence of 10 dB and 20 dB white noise, while there is no complicated training required in the proposed DTW with the NMF.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）