IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique
Tanasan SrikotrKazunori Mano
著者情報
ジャーナル 認証あり 早期公開

論文ID: 2021SMP0018

この記事には本公開記事があります。
詳細
抄録

The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16 kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5 dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.

著者関連情報
© 2021 The Institute of Electronics, Information and Communication Engineers
feedback
Top