Host: The Japanese Society for Artificial intelligence
Name : The 97th SIG-SLUD
Number : 97
Location : [in Japanese]
Date : March 08, 2023 - March 09, 2023
Pages 01-04
Many videos in various languages are posted on video-sharing websites such as YouTube. Watching the videos is promising to be a listening practice for second language learners. However, many of the videos posted on these websites were not produced as listening materials, and some speakers have distinctive accents and other problems making them difficult for learners to understand. For this reason, learners often adjust the playback speed to an easy-to-listen-to speed for them. This research aims to provide an environment in which learners can adjust the accent of the speaker in the video to be more like that of their mother tongue and make it easier to listen to, in order to enable further effects of scaffolding in combination with speed adjustment. We investigated the use of adversarial generative networks (GANs) and other speech conversion methods for this purpose and conducted experiments using MelGAN-VC to convert speech. As a result, it was confirmed that it is difficult to suppress noise to the extent that it does not bother the learners.