It is well known that MUSIC (MUltiple SIgnal Classification algorithm) is a standard DOA (Direction of Arrival) estimation for persistently exciting continuous signals such as radio waves and ultrasonic waves. Recently it has been applied to estimate the DOA of speech sounds, which are not necessarily the persistently exciting continuous signals but are the signals intermittently repeating voice and silence sections. From this point of view, the present paper proposes a framebased DOA estimation for a target sound source by using two microphones. More specifically, first, two observed mixtures are transformed at each frame to complex spectra by the short-time Fourier transform. Then, based on their phase difference, local DOAs are calculated. Next, a distribution of these DOAs is created and evaluated by a sparseness measure, and the frame with its evaluated value being over some threshold is judged as a single-source frame. At that frame, finally, the angle taking the peak of the distribution is adopted as the DOA estimate. The validity of our proposed method has confirmed by simulations under the environment where SNR is more than 20[dB] and the reverberation time is within 200[msec].
抄録全体を表示