Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
Text-to-image generation aims to generate images according to a given text describing scene information such as objects and scenery. The existing methods implicitly learn the correspondence relation between words in text and regions in an image from text-image pairs by an attention mechanism. However, the objects specified in the text often do not appear in the generated image. In this paper, we propose a text-to-image generation model that explicitly learns the correspondence relation between objects in the text and in the generated image to improve object coverage. The proposed method applies object detection to the generated image and promotes missing objects to appear in the image by introducing a loss function considering the completeness of the correspondence between objects in a text and in an image. We demonstrate our model outperforms existing methods in object coverage.