The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances

Yutaka TSUBOI; Takehiro IHARA; Kazuyuki TAKAGI; Kazuhiko OZEKI

doi:10.1093/ietisy/e91-d.6.1774

Abstract

A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped subbands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!