IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
A Multimodal Emotion Recognition Model with Intra-modal Enhancement and Inter-modal Interaction
Zhe ZHANGYiding WANGJiali CUIHan ZHENG
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024EDP7161

Details
Abstract

Multimodal Emotion Recognition (MER) is a critical task in sentiment analysis. Current methods primarily focus on multimodal fusion and representation of emotions, but they fail to capture the collaborative interaction in modalities effectively. In this study, we propose an MER model with intra-modal enhancement and inter-modal interaction (IEII). Firstly, this model extracts emotion information through RoBERTa, openSMILE, and DenseNet architectures from text, audio and video modalities respectively. The model designs the Large Enhanced Kernel Attention (LEKA) module which utilizes a simplified attention mechanism with large convolutional kernels, enhances intra-modal emotional information, and aligns modalities effectively. Then the multimodal representation space is proposed, which is constructed with transformer encoders to explore intermodal interactions. Finally, the model designs a Dual-Branch Multimodal Attention Fusion (DMAF) module based on grouped query attention and rapid attention mechanisms. The DMAF module integrates multimodal emotion representations and realizes the MER. The experimental results indicate that the model achieves superior overall accuracy and F1-scores on the IEMOCAP and MELD datasets compared to existing methods. It proved that the proposed model effectively enhances intra-modal emotional information and captures inter-modal interactions.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top