自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Zhishen YangRaj DabreHideki TanakaNaoaki Okazaki
著者情報
ジャーナル フリー

2024 年 31 巻 3 号 p. 1140-1165

詳細
抄録

Figures in scholarly documents provide a straightforward method of communicating scientific findings to readers. Automating figure caption generation enhances model understanding of scientific documents beyond text and helps authors write informative captions. Unlike previous studies, we refer to scientific figure captioning as a knowledge-augmented image-captioning task in which models must utilize knowledge embedded across modalities for caption generation. To this end, we extend the large-scale SciCap dataset (Hsu et al. 2021) to SciCap+, which includes mention paragraphs (paragraphs mentioning figures) and OCR tokens. We then conducted experiments using the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention paragraphs serve as additional context knowledge, significantly boosting automatic standard image caption evaluation scores compared to figure-only baselines. Human evaluations further reveal the challenges associated with generating figure captions that are informative to readers. The code and SciCap+ dataset are publicly available: https://github.com/ZhishenYang/scientific_figure_captioning_dataset

著者関連情報
© 2024 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top