IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition
Chao-Yuan KAOSangwook PARKAlzahra BADIDavid K. HANHanseok KO
著者情報
ジャーナル フリー

2020 年 E103.D 巻 5 号 p. 1195-1198

詳細
抄録

Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.

著者関連情報
© 2020 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top