Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
In recent years, generative models using diffusion process have achieved the state-of-the-art performance in the continuous domain and have been actively studied in discrete data generation. In this study, we propose caption generation using a language model and a classifier based on diffusion process. To improve the performance of caption generation, we examine the difference in accuracy with and without a pre-trained language model in the classifier, and investigate under what conditions appropriate captions can be generated for each image. Although the accuracy of our method using diffusion process was not good, we have confirmed that natural language generation could be controlled by the performance of a classifier in the sampling process.