Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Original Paper
Word-Aware Modality Stimulation for Multimodal Fusion
Shuhei TateishiYasuhito OsugiMakoto Nakatsuji
Author information
JOURNAL FREE ACCESS

2025 Volume 40 Issue 3 Pages D-O92_1-10

Details
Abstract

Multimodal learning is generally expected to make more accurate predictions than text-only analysis. Here,although various methods for fusing multimodal inputs have been proposed for sentiment analysis tasks, we foundthat they may be inhibiting their fusion methods, which are based on attention-based language models, from learningnon-verbal modalities, because non-verbal ones are isolated from the linguistic semantics and contexts and do notinclude them, meaning that they are unsuitable for applying attention to text modalities during the fusion phase. Toaddress this issue, we propose Word-Aware Modality Stimulation Fusion (WA-MSF) for facilitating integration ofnon-verbal modalities with the text modality. The Modality Stimulation Unit layer (MSU-layer) is the core conceptof WA-MSF; it integrates language contexts and semantics into non-verbal modalities, thereby instilling linguisticessence into these modalities. Moreover, WA-MSF uses aMLP in the fusion phase in order to utilize spatial andtemporal representations of non-verbal modalities more effectively than transformer fusion. In our experiments, WAMSFset a new state-of-the-art level of performance on sentiment prediction tasks.

Content from these authors
© JSAI (The Japanese Society for Artificial Intelligence)
Previous article Next article
feedback
Top