人工知能学会全国大会論文集
Online ISSN : 2758-7347
36th (2022)
セッションID: 1S4-IS-1-05
会議情報

Customizable text-based visual content creation with self-supervised learning
*Jun-Li LUYoichi OCHIAI
著者情報
会議録・要旨集 フリー

詳細
抄録

AI-generation of images from textual descriptions has shown advanced and attractive capabilities. However, commonly trained machine-learning models or built AI-based systems may have deficient points to generate satisfied results for customized usage, maybe because of deficient understanding of textual expressions or low customization of trained text-to-image models. Therefore, we assist in creating flexible and diverse visual contents from textual descriptions. In modeling, we generate synthesized images using word-visual co-occurrence by Transformer model and synthesize images by decoding visual tokens. To improve visual and textual expressions and their relevance with more diversities, we utilize contrastive learning applying on texts, images, or pairs of texts and images. In the experimental results of a dataset of birds, we showed that the rendering quality was required of models with some scale neural-networks, and necessary training process with fined training by applying relatively low learning rates until the end of training. We further showed contrastive learning was possible for improvement of visual and textual expressions and their relevance.

著者関連情報
© 2022 The Japanese Society for Artificial Intelligence
前の記事 次の記事
feedback
Top