Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
35th (2021)
Session ID : 3E2-OS-5b-01
Conference information

Estimating Feedback Responses and the Intensity of Facial Expressions based on Multimodal Information
*Ryosuke UENOTatuya SAKATOYukiko NAKANO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Providing feedback to a speaker is an essential communication signal for maintaining a conversation. In addition to verbal feedback responses, facial expressions are also effective modalities to convey the listener's response to the speaker's utterances. Moreover, not only the type of facial expressions, but also the degree of intensity of the expression may influence the meaning of the specific feedback. In this study, we propose a multimodal deep neural network model that predicts the intensity of facial expressions co-occurring with feedback responses. We collected 33 video-mediated conversations by groups of three people and obtained language, facial and audio data for each participant. We also annotated feedback responses and clustered their BERT-embedding expressions to classify feedback responses. In the proposed method, a decoder with attention mechanism for audio, visual, and language modalities produce the intensity for the 17 AUs frame by frame and a classifier of feedback labels were trained by multi-task learning. In the evaluation of the prediction performance of the feedback label, there was a bias in the prediction performance depending on the category. For AU intensity prediction, the multi-task model had a smaller loss function value (loss) than the single-task model, indicating a better model.

Content from these authors
© 2021 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top