A Multimodal Emotion Recognition Model with Intra-modal Enhancement and Inter-modal Interaction

Zhe ZHANG; Yiding WANG; Jiali CUI; Han ZHENG

doi:10.1587/transinf.2024EDP7161

抄録

Multimodal Emotion Recognition (MER) is a critical task in sentiment analysis. Current methods primarily focus on multimodal fusion and representation of emotions, but they fail to capture the collaborative interaction in modalities effectively. In this study, we propose an MER model with intra-modal enhancement and inter-modal interaction (IEII). Firstly, this model extracts emotion information through RoBERTa, openSMILE, and DenseNet architectures from text, audio and video modalities respectively. The model designs the Large Enhanced Kernel Attention (LEKA) module which utilizes a simplified attention mechanism with large convolutional kernels, enhances intra-modal emotional information, and aligns modalities effectively. Then the multimodal representation space is proposed, which is constructed with transformer encoders to explore intermodal interactions. Finally, the model designs a Dual-Branch Multimodal Attention Fusion (DMAF) module based on grouped query attention and rapid attention mechanisms. The DMAF module integrates multimodal emotion representations and realizes the MER. The experimental results indicate that the model achieves superior overall accuracy and F1-scores on the IEMOCAP and MELD datasets compared to existing methods. It proved that the proposed model effectively enhances intra-modal emotional information and captures inter-modal interactions.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PPV is available from https://globals.ieice.org/en_transactions/information

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）