Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4Xin2-53
Conference information

Verification of Using LLM for Automating Dialogue Data Evaluation
*Yuki KUBOTomoya YAMASHITAMasanori YAMADA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

There are many methods for building dialogue systems, but research on evaluating dialogues remains challenging. Metrics like the quality of dialogue, which are difficult to quantify, are often evaluated by human judgement. Recently, methods using LLMs for evaluating dialogue data have been proposed. LLMs evaluate relatively similarly to human, but the evaluation is not similar sufficiently. The Elo rating system, which evaluates data by comparing two data, is assumed that it does not need to consider the difference of standards by evaluators. So, Elo rating system is expected to increase accuracy. In some cases, Elo rating system may not increase accuracy, like the distribution of evaluation values is biased. In this study, we examine whether the Elo rating system increases accuracy of evaluation in various distributions of evaluation values.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top