Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 1B3-GS-2-05
Conference information

A method for improving the accuracy of multi-domain adaptive vision language model using prompt learning
*ZHENYU GAOAYAKO YAMAGIWAMASAYUKI GOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Methods for analyzing image data associated with linguistic information have garnered recent attention but encounter challenges due to varying data quantities across different image domains. In response, LADS was proposed, a model trainable without relying on image data from domains with limited samples, utilizing the embedding space between images and text in image language models. While LADS often employs simple domain description text, adequate text can improve model performance. To tackle this issue, we introduce CoOp, a method that optimizes the domain text in CLIP to enhance accuracy. CoOp achieves this by learning prompts, improving vision language models, and elevating CLIP accuracy. We expect the resulting prompts to represent diverse domains within LADS effectively. Finally, we validate the efficacy of our proposed method by applying it to actual data, demonstrating its ability to address imbalanced data quantities across various image domains.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top