We introduce a new optimized microphone-array processing method for a spoken-dialogue robot in noisy and reverberant environments. The method is based on frequency-domain blind signal extraction, a signal separation algorithm that exploits the sparseness of a speech signal to separate the target speech and diffuse background noise from the sound mixture captured by a microphone array. This algorithm is combined with multichannel Wiener filtering so that it can effectively suppress both background noise and reverberation, given a priori information of room reverberation time. In this paper, first, we develop an automatic optimization scheme based on the assessment of musical noise via higher-order statistics and acoustic model likelihood. Next, to maintain the optimum performance of the system, we propose the multimodal switching scheme using the distance information provided by robot's image sensor and the estimation of SNR condition. Experimental evaluations have been conducted to confirm the efficacy of this method.
2015 by The Acoustical Society of Japan