Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
VMInNet: Interpolation of Virtual Microphones in Optimal Latent Space Explored by Autoencoder
Riki TakahashiLi LiShoji MakinoTakeshi Yamada
著者情報
ジャーナル フリー

2021 年 25 巻 6 号 p. 245-250

詳細
抄録

In this paper, we propose a new method for the interpolation of virtual signals between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique is a recently proposed technique that can virtually increase the number of channels of observed signals by linearly interpolating the phase and nonlinearly interpolating the amplitude based on β-divergence in the short-time Fourier transform (STFT) domain. This technique has been shown to be effective in improving the speech enhancement performance of beamforming in underdetermined situations. It is reasonable to linearly interpolate the phase based on the sound propagation model and nonlinearly interpolate the amplitude to increase the information content of the observed signals. However, there is no theoretical proof that β-divergence is the optimal criterion for amplitude interpolation due to the complexity of the physical model of amplitude. In this paper, we propose the use of an autoencoder to search for the optimal interpolation domain in a data-driven manner. We perform amplitude interpolation in the latent space, a low-dimensional representation space of observed mixture signals that is trained so that the interpolated virtual signals are optimal for conducting beamforming with high performance. Experimental results revealed that the proposed method achieved higher speech enhancement performance than conventional methods.

著者関連情報
© 2021 Research Institute of Signal Processing, Japan
前の記事 次の記事
feedback
Top