End-to-End 学習を用いた音声からの表情アニメーション生成

大道 博文; 目良 和也; 黒澤 義明; 竹澤 寿幸

doi:10.11517/pjsai.JSAI2020.0_3Rin404

34th (2020)

Session ID : 3Rin4-04

DOI https://doi.org/10.11517/pjsai.JSAI2020.0_3Rin404

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : 34th Annual Conference, 2020

Number : 34

Location : Online

Date : June 09, 2020 - June 12, 2020

Generation of Facial Animation from Voice using End-to-End Learning

*Hirofumi OMICHI, Kazuya MERA, Yoshiaki KUROSAWA, Toshiyuki TAKEZAWA

Author information

Keywords: Deep Learning, Generation of Facial Animation, Entertainment, Voice

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Expressive facial animation has an important role in communication. Some avatars can express them using Face Tracking, that is one of the typical facial expression synchronization methods, but facial expressions cannot be created from previously recorded speech or synthetic speech without facial expressions. In this study, we propose a method to generate facial animation using only voice. Specifically, a learning model is designed using the acoustic features of the uttered speech as input and the parameters of the Action Unit (AU) analyzed from the facial expression video as teacher data. The experimental results indicated that the loss value of our proposed method was lower than that of the existing method. In addition, the activities of AUs by proposed method fluctuated smoother than the existing method. It will be perceived as natural facial expression.

Corresponding author

Conference information

Register with J-STAGE for free!