International Symposium on Affective Science and Engineering
Online ISSN : 2433-5428
ISASE2022
Session ID : PM-2B-4
Conference information

Cognitive Science & Artificial Intelligence
Making an English Speech Resemble the User’s Voice Using UTAU and Interactive Evolutionary Computation
Taichi MIYAMOTOHaoran GANMakoto FUKUMOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In general, learning English is difficult for non-native speakers because of the differences in vowels and consonants. There are some ways to practice English pronunciation such as shadowing, however, if the audio’s voice features greatly differ from the learner’s voice, it might impede learning and sound reproduction. In order to solve this problem, we propose a method to make the pronunciation data of the model pronunciation resemble the learner’s own voice by using UTAU and Interactive Evolutionary Computation. As a result of the experiments, we found that this method was capable of searching for highly evaluated solutions. The Wilcoxon signed-rank test was used to examine the statistical difference between the evaluations of the initial and final generations, and a significant difference was observed at P<0.01. Regarding to the pitch parameters, we could find different tendencies between males and females. This means the parameters were actually making the voice similar to examinee’s voice. However, there were some problems, such as the parameters that did not work well, the UTAU voice quality, the lack of female examinees, and so on. We plan to eliminate or at least reduce the effects from those problems in future experiments and make a better system for English learners so that they can learn more efficiently.

Content from these authors
© 2022 Japan Society of Kansei Engineering
Previous article Next article
feedback
Top