IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network
JianFeng WUHuiBin QINYongZhu HUALiHuan SHAOJi HUShengYing YANG
Author information
JOURNAL FREE ACCESS

2019 Volume E102.D Issue 10 Pages 2047-2050

Details
Abstract

This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.

Content from these authors
© 2019 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top