Consensus Knowledge-Guided Semantic Enhanced Interaction for Image-Text Retrieval

Hongbin Wang; Hui Wang; Fan Li

doi:10.20965/jaciii.2025.p0956

Abstract

Image–text retrieval, as a fundamental task in the cross-modal domain, centers on exploring semantic consistency and achieving precise alignment between related image–text pairs. Existing approaches primarily depend on co-occurrence frequency to construct coherent representations of commonsense knowledge introduction patterns, thereby facilitating high-quality semantic alignment across the two modalities. However, these methods often overlook the conceptual and syntactic correspondences between cross-modal fragments. To overcome these limitations, this work proposes a consensus knowledge-guided semantic enhanced interaction method, referred to as CSEI, for image–text retrieval. This method correlates both intra-modal and inter-modal semantics between image regions or objects and sentence words, aiming to minimize cross-modal discrepancies. Specifically, the initial step involves constructing visual and textual corpus sets that encapsulate rich concepts and relationships derived from commonsense knowledge. Subsequently, to enhance intra-modal relationships, a semantic relation-aware graph convolutional network is employed to capture more comprehensive feature representations. For inter-modal similarity reasoning, local and global similarity features are extracted through two cross-modal semantic enhancement mechanisms. In the final stage, the approach integrates commonsense knowledge with internal semantic correlations to enrich concept representation and further optimize semantic consistency by regularizing the importance disparities among association-enhanced concepts. Experiments conducted on MS-COCO and Flickr30K validate the effectiveness of the proposed method.

Content from these authors

This article cannot obtain the latest cited-by information.

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JACIII official website.
https://www.fujipress.jp/jaciii/jc-about/#https://creativecommons.org/licenses/by-nd

Favorites & Alerts

Corresponding author

Funder information

1.Fund name: National Natural Science Foundation of China

2.Fund name: Yunnan Natural Science Funds

Register with J-STAGE for free!