Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4N3-GS-6-05
Conference information

Caption Generation Reflecting User Intent Through a Diffusion Model
*Satoko HIRANOIchiro KOBAYASHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In recent years, generative models using a diffusion process have achieved state-of-the-art performance in the information processing in continuous space, and have also been actively studied in discrete data generation. This research is working on image caption generation which is a controllable natural language processing task using a diffusion language model. The aim is to develop an image captioning method that reflects not only the information obtained from the image but also the user's intention, which is estimated from the trajectory the user traces over the image. The user's level of interest in the object is determined from the time spent in the trace, and the object in the image is explained according to the order of each user's trace, realizing interactive caption generation. The experiments show that the proposed method is able to express the user's intention estimated from the trace in a generated sentence nonauto-regressively by using a diffusion process.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top