Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
Single-Channel Multispeaker Separation with Variational Autoencoder Spectrogram Model
Naoya MurashimaHirokazu KameokaLi LiShogo SekiShoji Makino
Author information
JOURNAL FREE ACCESS

2021 Volume 25 Issue 4 Pages 145-149

Details
Abstract

This paper deals with single-channel speaker-dependent speech separation. While discriminative approaches using deep neural networks (DNNs) have recently proved powerful, generative approaches, including methods based on non-negative matrix factorization (NMF), are still attractive because of their flexibility in handling the mismatch between training and test conditions. Although NMF-based methods work reasonably well for particular sound sources, one limitation is that they can fail to work for sources with spectrograms that do not comply with the NMF model. To address this problem, attempts have recently been made to replace the NMF model with DNNs. With a similar motivation to these attempts, we propose in this paper a variational autoencoder (VAE)-based monaural source separation (VASS) method using a conditional VAE (CVAE) for source spectrogram modeling. We further propose an extension of the VASS method, called the discriminative VASS (DVASS) method, which uses a discriminative criterion for model training so that the separated signals directly become optimal. Experimental results revealed that the VASS method performed better than an NMF-based method, and the DVASS method performed better than the VASS method.

Content from these authors
© 2021 Research Institute of Signal Processing, Japan
Previous article Next article
feedback
Top