Sound Source Separation in the Frequency Domain with Image Processing

Kazuhiro Ninagawa; Takashi Umeyama; Kenji Suzuki; Noboru Sugie

doi:10.1541/ieejeiss1987.121.12_1866

Abstract

We propose a new method for extracting separately each of the sounds from the mixture of two speech sounds, which are uttered concurrently. First the mixture is transformed into a sound spectrogram which is thereafter treated as an image. Exploiting image processing techniques, the onsets and offsets of the fre-quency components of each speech sound are detected. Then the harmonic structure of each speech sound is extracted by tracing each onset through the corresponding offset and relating each of them to one another in the frequency domain. A set of band-pass filters are designed reflecting the extracted harmonic structure. Each speech sound is extracted by applying the set of band-pass filters to the mixture. Experiments were conducted with the mixture of a male speech sound and a female speech sound both consisting of Japanese vowels. The evaluation results demonstrated that the separation was done reasonably well with the proposed method.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!