Making an English Speech Similar to the User’s Voice using UTAU and Interactive Differential Evolution

Taichi MIYAMOTO; Haoran GAN; Makoto FUKUMOTO

doi:10.5057/ijae.IJAE-D-22-00015

Abstract

Practicing English pronunciation is difficult for non-native speakers because of the differences in vowels and consonants. There are several ways to practice them such as Shadowing, however, if the voice’s features greatly differ from the learner’s voice, it should be difficult for learners to reproduce. To solve this problem, we propose a method to make the pronunciation data of the model pronunciation similar to the learner’s voice by using UTAU and Interactive Differential Evolution. A listening experiment was conducted with the concrete system of IDE and UTAU. Twelve examinees participated in the experiment through ten generations based on paired comparisons for making the voices similar to their own voices inside their heads. As a result, we could successfully make the voices similar to the examinees’ voices. Since it has paired comparison, we believe that the paired comparison-based IDE is a better method than the general Interactive Genetic Algorithm with scoring.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!