NIHON GAZO GAKKAISHI (Journal of the Imaging Society of Japan)
Online ISSN : 1880-4675
Print ISSN : 1344-4425
ISSN-L : 1344-4425
Special Topic
Anticipation Captioning with Commonsense Knowledge
Duc Minh VOQuoc-An LUONGAkihiro SUGIMOTOHideki NAKAYAMA
Author information
JOURNAL FREE ACCESS

2023 Volume 62 Issue 6 Pages 588-598

Details
Abstract

In this review, we introduce a novel image captioning task, called Anticipation Captioning, which generates a caption for an unseen image given a sparsely temporally-ordered set of images. Our task emulates the human capacity to reason about the future based on a sparse collection of visual cues acquired over time. To address this novel challenge, we introduce a model, namely A-CAP, that predicts the caption by incorporating commonsense knowledge into a pre-trained vision-language model. Our method outperforms image captioning methods and provides a solid baseline for anticipation captioning task, as shown in both qualitative and quantitative evaluations on a customized visual storytelling dataset. We also discuss the potential applications, challenges, and future directions of this novel task.

Content from these authors
© 2023 by The Imaging Society of Japan
Previous article Next article
feedback
Top