Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Methods for analyzing image data associated with linguistic information have garnered recent attention but encounter challenges due to varying data quantities across different image domains. In response, LADS was proposed, a model trainable without relying on image data from domains with limited samples, utilizing the embedding space between images and text in image language models. While LADS often employs simple domain description text, adequate text can improve model performance. To tackle this issue, we introduce CoOp, a method that optimizes the domain text in CLIP to enhance accuracy. CoOp achieves this by learning prompts, improving vision language models, and elevating CLIP accuracy. We expect the resulting prompts to represent diverse domains within LADS effectively. Finally, we validate the efficacy of our proposed method by applying it to actual data, demonstrating its ability to address imbalanced data quantities across various image domains.