VMInNet: Interpolation of Virtual Microphones in Optimal Latent Space Explored by Autoencoder

Riki Takahashi; Li Li; Shoji Makino; Takeshi Yamada

doi:10.2299/jsp.25.245

Abstract

In this paper, we propose a new method for the interpolation of virtual signals between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique is a recently proposed technique that can virtually increase the number of channels of observed signals by linearly interpolating the phase and nonlinearly interpolating the amplitude based on β-divergence in the short-time Fourier transform (STFT) domain. This technique has been shown to be effective in improving the speech enhancement performance of beamforming in underdetermined situations. It is reasonable to linearly interpolate the phase based on the sound propagation model and nonlinearly interpolate the amplitude to increase the information content of the observed signals. However, there is no theoretical proof that β-divergence is the optimal criterion for amplitude interpolation due to the complexity of the physical model of amplitude. In this paper, we propose the use of an autoencoder to search for the optimal interpolation domain in a data-driven manner. We perform amplitude interpolation in the latent space, a low-dimensional representation space of observed mixture signals that is trained so that the interpolated virtual signals are optimal for conducting beamforming with high performance. Experimental results revealed that the proposed method achieved higher speech enhancement performance than conventional methods.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!