Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Uncertainty-aware Automatic Evaluation Method for Open-domain Dialogue Systems
Yuma TsutaNaoki YoshinagaMasashi Toyoda
Author information
JOURNAL FREE ACCESS

2023 Volume 30 Issue 2 Pages 531-556

Details
Abstract

Because open-domain dialogues allow diverse responses, common reference-based metrics for text generation, such as bleu, do not correlate well with human judgments unless we prepare an extensive reference set of high-quality responses for input utterances. In this study, we propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υbleu. Our method first collects diverse reference responses from massive dialogue data, annotates their quality judgments by using a neural network trained on automatically collected training data, and then computes weighted bleu using the automatically-retrieved and -rated reference responses. We also employ this method with an embedding-based metric, bertscore, instead of the word-overlap-based metric, bleu, to absorb surface variations of the reference responses. The experimental results on the meta-evaluation of our evaluation method for dialogue systems based on massive Twitter data confirmed that our method substantially improves correlations between bleu (or bertscore) and human judgments. We also confirmed that our method is effective when it is combined with a reference-free metric.

Content from these authors
© 2023 The Association for Natural Language Processing
Previous article Next article
feedback
Top