Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network

JianFeng WU; HuiBin QIN; YongZhu HUA; LiHuan SHAO; Ji HU; ShengYing YANG

doi:10.1587/transinf.2019EDL8023

Abstract

This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!