Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 2G4-GS-6-01
Conference information

Automated Evaluation of Question Answering by Using Large Language Models and Its Defense Against Prompt Injections
*Takumi KONDOKoh TAKEUCHIJiyi LIShigeru SAITOHisashi KASHIMA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In many question-answering tasks of Natural Language Processing, its evaluation is based on exact or partial matching between the candidate text and pre-prepared reference answers, regardless of the question domain. However, evaluating open-domain question-answering, which does not limit the range of topics, becomes problematic due to issues such as synonymous expressions and variations in notation, making accurate assessment challenging through lexical matching. Existing studies have proposed automated evaluation using Large Language Models (LLM) to address these challenges, but discussions on the vulnerability of automated evaluation are lacking. In this study, we propose a new framework for automated evaluation using LLM to address these issues and discuss its performance and robustness. Our experiments show that LLM's automated evaluation aligns with human evaluation in over 90% of cases and demonstrates resilience against attacks on the evaluation system.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top