マルチモーダル基盤モデルと緩和対照損失を用いた大規模屋内検索エンジン

今井 悠人; 兼田 寛大; 是方 諒介; 杉浦 孔明

doi:10.11517/pjsai.JSAI2024.0_3O5OS16c04

38th (2024)

Session ID : 3O5-OS-16c-04

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_3O5OS16c04

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

Large-Scale Indoor Search Engine with Multimodal Foundation Models and Relaxing Contrastive Loss

*Yuto IMAI, Kanta KANEDA, Ryosuke KOREKATA, Komei SUGIURA

Author information

Keywords: Learning to rank

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this paper, we focus on the learning-to-rank physical objects task. In this task, images of objects within large-scale indoor environments are ranked based on open-vocabulary user instructions. We introduce the GREP module to construct visual features considering image, target object, relative positions, and pixel granularities. Additionally, we introduce the RCS module to efficiently learn from redundant images taken in the indoor environment. Our method outperformed baseline methods on the newly constructed YAGAMI dataset and an extended LTRRIE-subset, showing significant improvements in the standard metrics.

Corresponding author

Conference information

Register with J-STAGE for free!