Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
Star generative adversarial network for voice conversion (StarGAN-VC) is a method allowing non-parallel many-to-many voice conversion. Though in voice conversion task, retention of linguistic information is very important, sounds converted by StarGAN-VC sometimes collapsed linguistic information. This is because StarGAN-VC does not use any linguistic information during learning the voice conversion, and it just focuses non-symbolic acoustic features.This paper proposes a method that exploited speech recognition results presumed by automatic speech recognition (ASR) in training of StarGAN-VC's Generator. The experiment shows that our proposed method can make StarGAN-VC retain more linguistic information than the vanilla StarGAN-VC.