Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : September 20, 2022 - September 22, 2022
This study estimates emotion from the speaker's voice signal and utterance character string. First, the acoustic features are extracted using openSMILE from the utterance voice acquired through the microphone, and emotions are estimated by classifying them into emotion classes by machine learning. Then, morphological analysis is performed from the spoken voice using a voice recognition engine, emotions are classified from the spoken character string, from where emotions are estimated. At the same time, facial expressions, head and neck movements are tracked by image analysis of video image of the speaker. After synthesizing and converting the estimated emotions, it will project onto the avatar to express the speaker's emotions with the facial expressions and movements of the avatar.