大規模マルチモーダルモデルを用いたグラフィックレイアウトの自動生成に向けて

王 力敏; 脇 聡志; 鈴村 豊太郎

doi:10.11517/pjsai.JSAI2024.0_3F1GS1005

Abstract

Given the recent advancement of generative models, it has become possible that AI instead of humans generates graphic layouts. Among existing methods for layout generation, some utilize not only the information of each element but also constraints such as the relationships between elements. However, these methods often require humans to specify the constraints, which can be burdensome. Additionally, they have the limitation of only considering the category information of layout elements like “image”, “text”, “title”, and so on, without taking into account the detailed content within those images or text. Thus, this study proposes a method that leverages the detailed content of elements and automatically generate constraints that will be used for layout generation. Since elements can be either images or text, we explore the use of large multimodal models for extracting detailed content. This approach leads to the automatic generation of graphic layouts with less need for extensive human input.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!