Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 4Rin1-36
Conference information

Automatic evaluation of open-domain dialogue systems using automatically-augmented references
*Yuma TSUTANaoki YOSHINAGAMasashi TOYODA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In open-domain dialogues, the content and style of responses can vary. However, it is difficult to consider the diversity of responses when evaluating responses generated by dialogue systems, since basically only one response can be extracted as a reference response from real conversations. To address this problem, ΔBLEU uses reference responses that are extended with responses in massive dialogue data and are manually annotated with appropriateness as a response. Because the human annotation is costly, we cannot utilize ΔBLEU for a large-scale evaluation of open-domain dialogue systems that should be evaluated in various contexts. We propose a fully-automatic evaluation method ΔBLEU-auto that annotates the appropriateness of extended responses used in ΔBLEU by a classifier trained with automatically-collected training data. Experimental results confirmed that ΔBLEU-auto is comparable to ΔBLEU in terms of correlation with human judgement, and also improves the state-of-the-art evaluation method, RUBER, by integrating our ΔBLEU-auto into RUBER.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top