Entity Knowledge-Guided Image-Text Alignment for Joint Multimodal Aspect-Based Sentiment Analysis

Yan XIANG; Di WU; Yunjia CAI; Yantuan XIAN

doi:10.1587/transinf.2024EDP7313

抄録

Joint multimodal aspect-based sentiment analysis (JMABSA) aims to extract aspects from multimodal inputs and determine their sentiment polarity. Existing research often faces challenges in effectively aligning aspect features across images and text. To address this, we propose an entity knowledge-guided image-text alignment network that integrates alignment across both modalities, enabling the model to more accurately capture jointly expressed aspect and sentiment information in images and text. Specifically, we introduce an entity class embedding to guide the model in learning entity-related features from text. Additionally, we utilize scene and aspect descriptions in images as entity knowledge, helping the model learn entity-relevant features from visual input. The alignment between entity knowledge in images and the initial text further supports the model in learning consistent aspect and sentiment expressions across modalities. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance on two public datasets.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PPV is available from https://globals.ieice.org/en_transactions/information

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）