SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Zhishen Yang; Raj Dabre; Hideki Tanaka; Naoaki Okazaki

doi:10.5715/jnlp.31.1140

抄録

Figures in scholarly documents provide a straightforward method of communicating scientific findings to readers. Automating figure caption generation enhances model understanding of scientific documents beyond text and helps authors write informative captions. Unlike previous studies, we refer to scientific figure captioning as a knowledge-augmented image-captioning task in which models must utilize knowledge embedded across modalities for caption generation. To this end, we extend the large-scale SciCap dataset (Hsu et al. 2021) to SciCap+, which includes mention paragraphs (paragraphs mentioning figures) and OCR tokens. We then conducted experiments using the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention paragraphs serve as additional context knowledge, significantly boosting automatic standard image caption evaluation scores compared to figure-only baselines. Human evaluations further reveal the challenges associated with generating figure captions that are informative to readers. The code and SciCap+ dataset are publicly available: https://github.com/ZhishenYang/scientific_figure_captioning_dataset

著者関連情報

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）