Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
Single-Channel Multispeaker Separation with Variational Autoencoder Spectrogram Model
Naoya MurashimaHirokazu KameokaLi LiShogo SekiShoji Makino
著者情報
ジャーナル フリー

2021 年 25 巻 4 号 p. 145-149

詳細
抄録

This paper deals with single-channel speaker-dependent speech separation. While discriminative approaches using deep neural networks (DNNs) have recently proved powerful, generative approaches, including methods based on non-negative matrix factorization (NMF), are still attractive because of their flexibility in handling the mismatch between training and test conditions. Although NMF-based methods work reasonably well for particular sound sources, one limitation is that they can fail to work for sources with spectrograms that do not comply with the NMF model. To address this problem, attempts have recently been made to replace the NMF model with DNNs. With a similar motivation to these attempts, we propose in this paper a variational autoencoder (VAE)-based monaural source separation (VASS) method using a conditional VAE (CVAE) for source spectrogram modeling. We further propose an extension of the VASS method, called the discriminative VASS (DVASS) method, which uses a discriminative criterion for model training so that the separated signals directly become optimal. Experimental results revealed that the VASS method performed better than an NMF-based method, and the DVASS method performed better than the VASS method.

著者関連情報
© 2021 Research Institute of Signal Processing, Japan
前の記事 次の記事
feedback
Top