物体マッチングにより物体網羅性を向上した画像生成

石井 尚悟; 山崎 禎晃; 伊東 聖矢; 大原 剛三

doi:10.11517/pjsai.JSAI2022.0_2O1GS705

Abstract

Text-to-image generation aims to generate images according to a given text describing scene information such as objects and scenery. The existing methods implicitly learn the correspondence relation between words in text and regions in an image from text-image pairs by an attention mechanism. However, the objects specified in the text often do not appear in the generated image. In this paper, we propose a text-to-image generation model that explicitly learns the correspondence relation between objects in the text and in the generated image to improve object coverage. The proposed method applies object detection to the generated image and promotes missing objects to appear in the image by introducing a loss function considering the completeness of the correspondence between objects in a text and in an image. We demonstrate our model outperforms existing methods in object coverage.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!