自動拡張した参照応答に基づく雑談対話システムの自動評価

蔦 侑磨; 吉永 直樹; 豊田 正史

doi:10.11517/pjsai.JSAI2020.0_4Rin136

Abstract

In open-domain dialogues, the content and style of responses can vary. However, it is difficult to consider the diversity of responses when evaluating responses generated by dialogue systems, since basically only one response can be extracted as a reference response from real conversations. To address this problem, ΔBLEU uses reference responses that are extended with responses in massive dialogue data and are manually annotated with appropriateness as a response. Because the human annotation is costly, we cannot utilize ΔBLEU for a large-scale evaluation of open-domain dialogue systems that should be evaluated in various contexts. We propose a fully-automatic evaluation method ΔBLEU-auto that annotates the appropriateness of extended responses used in ΔBLEU by a classifier trained with automatically-collected training data. Experimental results confirmed that ΔBLEU-auto is comparable to ΔBLEU in terms of correlation with human judgement, and also improves the state-of-the-art evaluation method, RUBER, by integrating our ΔBLEU-auto into RUBER.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!