拡散過程に基づくモデルによるトレースからユーザの意図を反映したキャプション生成への取り組み

平野 理子; 小林 一郎

doi:10.11517/pjsai.JSAI2024.0_4N3GS605

Abstract

In recent years, generative models using a diffusion process have achieved state-of-the-art performance in the information processing in continuous space, and have also been actively studied in discrete data generation. This research is working on image caption generation which is a controllable natural language processing task using a diffusion language model. The aim is to develop an image captioning method that reflects not only the information obtained from the image but also the user's intention, which is estimated from the trajectory the user traces over the image. The user's level of interest in the object is determined from the time spent in the trace, and the object in the image is explained according to the order of each user's trace, realizing interactive caption generation. The experiments show that the proposed method is able to express the user's intention estimated from the trace in a generated sentence nonauto-regressively by using a diffusion process.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!