第二言語学習者のためのMelGAN-VCを用いた音声変換手法の検討

森 清忠; 三好 康夫

doi:10.11517/jsaislud.97.0_01

Abstract

Many videos in various languages are posted on video-sharing websites such as YouTube. Watching the videos is promising to be a listening practice for second language learners. However, many of the videos posted on these websites were not produced as listening materials, and some speakers have distinctive accents and other problems making them difficult for learners to understand. For this reason, learners often adjust the playback speed to an easy-to-listen-to speed for them. This research aims to provide an environment in which learners can adjust the accent of the speaker in the video to be more like that of their mother tongue and make it easier to listen to, in order to enable further effects of scaffolding in combination with speed adjustment. We investigated the use of adversarial generative networks (GANs) and other speech conversion methods for this purpose and conducted experiments using MelGAN-VC to convert speech. As a result, it was confirmed that it is difficult to suppress noise to the extent that it does not bother the learners.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!