Abstract
Voice activity detection (VAD) is used to detect speech/nonspeech periods in observed signals and it is a very important technique for various speech signal processes. However, there is a serious problem in that the accuracy of detection of speech periods drastically reduces if the current VAD technique is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and animal sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes a robust method of VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. The proposed method reduces noise by using EMD, and then determines speech/non-speech periods by using MSA. Five experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with traditional methods (OTSU's, the G.729, and power envelope thresholding methods). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the traditional methods.