Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Zhishen YangRaj DabreHideki TanakaNaoaki Okazaki
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 3 Pages 1140-1165

Details
Abstract

Figures in scholarly documents provide a straightforward method of communicating scientific findings to readers. Automating figure caption generation enhances model understanding of scientific documents beyond text and helps authors write informative captions. Unlike previous studies, we refer to scientific figure captioning as a knowledge-augmented image-captioning task in which models must utilize knowledge embedded across modalities for caption generation. To this end, we extend the large-scale SciCap dataset (Hsu et al. 2021) to SciCap+, which includes mention paragraphs (paragraphs mentioning figures) and OCR tokens. We then conducted experiments using the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention paragraphs serve as additional context knowledge, significantly boosting automatic standard image caption evaluation scores compared to figure-only baselines. Human evaluations further reveal the challenges associated with generating figure captions that are informative to readers. The code and SciCap+ dataset are publicly available: https://github.com/ZhishenYang/scientific_figure_captioning_dataset

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top