Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
VMInNet: Interpolation of Virtual Microphones in Optimal Latent Space Explored by Autoencoder
Riki TakahashiLi LiShoji MakinoTakeshi Yamada
Author information
JOURNAL FREE ACCESS

2021 Volume 25 Issue 6 Pages 245-250

Details
Abstract

In this paper, we propose a new method for the interpolation of virtual signals between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique is a recently proposed technique that can virtually increase the number of channels of observed signals by linearly interpolating the phase and nonlinearly interpolating the amplitude based on β-divergence in the short-time Fourier transform (STFT) domain. This technique has been shown to be effective in improving the speech enhancement performance of beamforming in underdetermined situations. It is reasonable to linearly interpolate the phase based on the sound propagation model and nonlinearly interpolate the amplitude to increase the information content of the observed signals. However, there is no theoretical proof that β-divergence is the optimal criterion for amplitude interpolation due to the complexity of the physical model of amplitude. In this paper, we propose the use of an autoencoder to search for the optimal interpolation domain in a data-driven manner. We perform amplitude interpolation in the latent space, a low-dimensional representation space of observed mixture signals that is trained so that the interpolated virtual signals are optimal for conducting beamforming with high performance. Experimental results revealed that the proposed method achieved higher speech enhancement performance than conventional methods.

Content from these authors
© 2021 Research Institute of Signal Processing, Japan
Previous article Next article
feedback
Top