2025 Volume E108.D Issue 1 Pages 82-91
Visible-infrared person re-identification (VI-ReID) aims to achieve cross-modality matching between the visible and infrared modalities, thus enabling usage in all-day monitoring scenarios. Existing VI-ReID methods have indeed achieved promising performance by considering the global information for identity-related discriminative learning. However, they often overlook the importance of local information, which can contribute significantly to learning identity-specific discriminative cues. Moreover, the substantial modality gap typically poses challenges during the model training process. In response to the aforementioned issues, we propose a VI-ReID method called partial enhancement and channel aggregation (PECA) and make efforts in the following three aspects. Firstly, to capture local information, we introduce the global-local similarity learning (GSL) module, which compels the encoder to focus on fine-grained details by increasing the similarity between global and local features within various feature spaces. Secondly, to address the modality gap, we propose an inter-modality channel aggregation learning (ICAL) approach, which progressively guides the learning of modality-invariant features. ICAL not only progressively alleviates modality gap but also augments the training data. Additionally, we introduce a novel instance-modality contrastive loss, which facilitates the learning of modality-invariant and identity-related features at both the instance and modality levels. Extensive experiments on the SYSU-MM01 and RegDB datasets have shown that PECA outperforms state-of-the-art methods.