Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 1O5-GS-7-01
Conference information

Searching optimal caption in learning Text-to-Image model
*Jumpei NAKAOMasaru ISONUMAJunichiro MORIIchiro SAKATA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Text-to-Image models require datasets consisting of a huge number of image-caption pairs for training. Since the captions in such datasets are manually annotated, they are not necessarily optimal for training text-to-image models. In this study, we propose a learning framework that trains Text-to-Image models while optimizing the captions used for training. Specifically, we introduce a model that outputs pseudo captions from images and alternately update the parameters of the model and the Text-to-Image model through bilevel optimization. In the experiment, we evaluate the effectiveness of bilevel optimization for learning Text-to-Image models as a preliminary effort.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top