Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 3O5-OS-16c-04
Conference information

Large-Scale Indoor Search Engine with Multimodal Foundation Models and Relaxing Contrastive Loss
*Yuto IMAIKanta KANEDARyosuke KOREKATAKomei SUGIURA
Author information
Keywords: Learning to rank
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In this paper, we focus on the learning-to-rank physical objects task. In this task, images of objects within large-scale indoor environments are ranked based on open-vocabulary user instructions. We introduce the GREP module to construct visual features considering image, target object, relative positions, and pixel granularities. Additionally, we introduce the RCS module to efficiently learn from redundant images taken in the indoor environment. Our method outperformed baseline methods on the newly constructed YAGAMI dataset and an extended LTRRIE-subset, showing significant improvements in the standard metrics.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top