Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 3Q1-OS-19a-02
Conference information

Emotional Space Using Supervised Crossmodal Contrastive Learning with Human Rating Distribution
*Seiichi HARATATakuto SAKUMAShohei KATO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This study aims to acquire a mathematical representation of emotions from sensor data as a data-driven approach to emotion modeling. In order to represent human emotions, a modern representation learning method is used for embedding multimodal expressions into a shared latent representation (an emotional space). The proposed method uses supervised contrastive learning to embed emotionally similar data pairs closer together and dissimilar pairs farther apart, regardless of modality. Human emotions do not fall under any particular category but might be a complex mixture. Therefore, we consider a rating distribution with multiple raters' ratings for a single video, treat it as soft labels, and augment the loss function of supervised contrast learning using the similarity between soft labels. In the experiment using audio-visual data, we evaluate the robustness of emotion recognition when the modality is missing and confirm that the proposed method obtains shared representations of emotions across modalities in a low-dimensional emotional space. We also visualize the emotional space and observe the allocation of non-emotion-related information, such as the gender of the actor, to evaluate the effectiveness of the proposed method in representing the semantic relationships of human emotions.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top