2026 年 30 巻 2 号 p. 472-485
Multimodal sentiment analysis (MSA) is a crucial technique for understanding sentiment expression in social media, product reviews, and other multimedia content, and has been extensively studied in recent years. However, most of the existing MSA methods depend on large datasets. Collecting such data is costly and time-consuming, limiting the practical applicability of these models. To address this challenge, this paper proposes a few-shot multimodal sentiment analysis method based on dynamic adjustment and contrastive learning (DACL-FMSA). First, the method uses the BLIP model to generate semantic descriptions of images. These descriptions are then aligned with text inputs to bridge the semantic gap and enable more effective multimodal fusion. Second, based on the contrastive learning framework, the model’s ability to capture emotional features is enhanced by generating diverse views of image and text data, thus improving performance in few-shot tasks. Finally, to further optimize the learning process, this study designs a dynamic learning rate adjustment method based on a long short-term memory network, which dynamically adjusts the learning rate according to gradient changes to accelerate model convergence and achieve better training results. The experimental results show that DACL-FMSA achieves significant performance improvements. It performed well across multiple benchmark datasets. For Twitter-15, Twitter-17, and MASAD, the accuracies were 61.88%, 54.00%, and 81.78%, respectively. The accuracies of MVSA-S and TumEmo were 67.89% and 54.42%, respectively. These results consistently demonstrate the effectiveness of the DACL-FMSA.
この記事は最新の被引用情報を取得できません。