Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Text-to-Image models require datasets consisting of a huge number of image-caption pairs for training. Since the captions in such datasets are manually annotated, they are not necessarily optimal for training text-to-image models. In this study, we propose a learning framework that trains Text-to-Image models while optimizing the captions used for training. Specifically, we introduce a model that outputs pseudo captions from images and alternately update the parameters of the model and the Text-to-Image model through bilevel optimization. In the experiment, we evaluate the effectiveness of bilevel optimization for learning Text-to-Image models as a preliminary effort.