拡散過程を用いたキャプション生成性能向上への取り組み

平野 理子; 小林 一郎

doi:10.11517/pjsai.JSAI2023.0_2E5GS601

Abstract

In recent years, generative models using diffusion process have achieved the state-of-the-art performance in the continuous domain and have been actively studied in discrete data generation. In this study, we propose caption generation using a language model and a classifier based on diffusion process. To improve the performance of caption generation, we examine the difference in accuracy with and without a pre-trained language model in the classifier, and investigate under what conditions appropriate captions can be generated for each image. Although the accuracy of our method using diffusion process was not good, we have confirmed that natural language generation could be controlled by the performance of a classifier in the sampling process.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!