Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 4Xin1-26
Conference information

Frequency Analysis in Voice Conversion Using Generative Adversarial Networks
*Fuya WADAYoshiaki KUROSAWAKazuya MERAToshiyuki TAKEZAWA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In recent years, deep learning has enabled high-quality speech synthesis and voice quality conversion. Traditional methods use a GAN (Generative Adversarial Network) to perform voice conversion. However, the generated speech sounds a little muffled compared to actual speech. There are also some shortcomings regarding the generated 2D features. Therefore, in this study, the generated spectrogram is divided into several frequency bands, and the Mel-Cepstrum Distortion (MCD) of each frequency band to investigate and analyze which frequency bands are well generated. Analysis showed that the low frequency of the generated Spectrograms were well generated, but the mid/high frequency were not well generated. In addition, we found that although the linguistic information was reproduced, the reproduction of speaker characteristics was insufficient.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top