IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
Sound Source Separation in the Frequency Domain with Image Processing
Kazuhiro NinagawaTakashi UmeyamaKenji SuzukiNoboru Sugie
Author information
JOURNAL FREE ACCESS

2001 Volume 121 Issue 12 Pages 1866-1874

Details
Abstract
We propose a new method for extracting separately each of the sounds from the mixture of two speech sounds, which are uttered concurrently. First the mixture is transformed into a sound spectrogram which is thereafter treated as an image. Exploiting image processing techniques, the onsets and offsets of the fre-quency components of each speech sound are detected. Then the harmonic structure of each speech sound is extracted by tracing each onset through the corresponding offset and relating each of them to one another in the frequency domain. A set of band-pass filters are designed reflecting the extracted harmonic structure. Each speech sound is extracted by applying the set of band-pass filters to the mixture. Experiments were conducted with the mixture of a male speech sound and a female speech sound both consisting of Japanese vowels. The evaluation results demonstrated that the separation was done reasonably well with the proposed method.
Content from these authors
© The Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top