Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
In recent years, generative models using a diffusion process have achieved state-of-the-art performance in the information processing in continuous space, and have also been actively studied in discrete data generation. This research is working on image caption generation which is a controllable natural language processing task using a diffusion language model. The aim is to develop an image captioning method that reflects not only the information obtained from the image but also the user's intention, which is estimated from the trajectory the user traces over the image. The user's level of interest in the object is determined from the time spent in the trace, and the object in the image is explained according to the order of each user's trace, realizing interactive caption generation. The experiments show that the proposed method is able to express the user's intention estimated from the trace in a generated sentence nonauto-regressively by using a diffusion process.