人間の評定分布を用いた教師ありクロスモーダル対照学習による感情空間

原田 誠一; 佐久間 拓人; 加藤 昇平

doi:10.11517/pjsai.JSAI2023.0_3Q1OS19a02

Abstract

This study aims to acquire a mathematical representation of emotions from sensor data as a data-driven approach to emotion modeling. In order to represent human emotions, a modern representation learning method is used for embedding multimodal expressions into a shared latent representation (an emotional space). The proposed method uses supervised contrastive learning to embed emotionally similar data pairs closer together and dissimilar pairs farther apart, regardless of modality. Human emotions do not fall under any particular category but might be a complex mixture. Therefore, we consider a rating distribution with multiple raters' ratings for a single video, treat it as soft labels, and augment the loss function of supervised contrast learning using the similarity between soft labels. In the experiment using audio-visual data, we evaluate the robustness of emotion recognition when the modality is missing and confirm that the proposed method obtains shared representations of emotions across modalities in a low-dimensional emotional space. We also visualize the emotional space and observe the allocation of non-emotion-related information, such as the gender of the actor, to evaluate the effectiveness of the proposed method in representing the semantic relationships of human emotions.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!