Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 1D3-GS-7-03
Conference information

A Hallucination-Resistant Automatic Evaluation Metric for Image Captioning
*Kazuki MATSUDAYuiga WADAKomei SUGIURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In the field of image captioning, constructing automatic evaluation metrics that align closely with human judgment is crucial for effective model development. A key challenge in this field is addressing hallucinations, which are instances where models generate words unrelated to the image, a frequent issue in image captioning. Existing metrics often fail to manage hallucinations, primarily due to their limited capability in contrasting candidate captions against a diverse range of reference captions. To overcome this, we propose DENEB, a novel metric for image captioning, specifically robust to hallucinations. DENEB incorporates the Sim-Vec Transformer, a mechanism capable of processing multiple references and extracting similarity vectors effectively. Additionally, to train DENEB, we have expanded the Polaris dataset to create Polaris2.0, significantly enhancing supervised automatic evaluation metrics. Our dataset comprises 32,978 images and 32,978 human judgments from 805 annotators. Our approach achieved state-of-the-art performance on Composite, Flickr8K-Expert, Flickr8K-CF, PASCAL-50S, FOIL, and the Polaris 2.0 dataset, thereby demonstrating its effectiveness and robustness to hallucinations.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top