Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
PAPERS
Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges
Kimhuoch PekTakayuki AraiNoboru Kanedera
Author information
JOURNAL FREE ACCESS

2012 Volume 33 Issue 1 Pages 33-44

Details
Abstract

Voice activity detection (VAD) in noisy environments is a very important preprocessing scheme in speech communication technology, a field which includes speech recognition, speech coding, speech enhancement and captioning video contents. We have developed a VAD method for noisy environments based on the modulation spectrum. In Experiment 1, we investigate the optimal ranges of speech and modulation frequencies for the proposed algorithm by using the simulated data in the CENSREC-1-C corpus. Results show that when we combine an upper limit frequency between 1,000 and 2,000 Hz with a lower limit frequency of less than 300 Hz as speech frequency bands, error rates are lower than with other bands. Furthermore, when we use the frequency components of the modulation spectrum between 3–9, 3–11, 3–14, 3–18, 4–9, 4–11, 4–14, 4–18, 5–7, 5–9, 5–11, or 5–14 Hz, the proposed method performs VAD well. In Experiment 2, we use one of the best parameter settings from Experiment 1 and evaluate the real environment data in the CENSREC-1-C corpus by comparing our method with other conventional methods. Improvements were observed from the VAD results for each SNR condition and noise type.

Content from these authors
© 2012 by The Acoustical Society of Japan
Previous article Next article
feedback
Top