Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Original Paper
Selection of Evaluation Metrics for Dialogue Breakdown Detection in Dialogue Breakdown Detection Challenge 3
Yuiko TsunomoriRyuichiro HigashinakaTetsuro TakahashiMichimasa Inaba
Author information

2020 Volume 35 Issue 1 Pages DSI-G_1-10


The task of detecting dialogue breakdown, the aim of which is to detect whether a system utterance causes dialogue breakdown in a given dialogue context, has been actively researched in recent years. However, currently, it is not clear which evaluation metrics should be used to evaluate dialogue breakdown detectors, hindering progress in dialogue breakdown detection. In this paper, we propose finding appropriate metrics for evaluating the detectors in dialogue breakdown detection challenge 3. In our approach, we first enumerate possible evaluation metrics and then rank them on the basis of system ranking stability and discriminative power. By using the submitted runs (results of dialogue breakdown detection of participants) of dialogue breakdown detection challenge 3, we experimentally found that RSNOD(NB,PB,B) is an appropriate metric for dialogue breakdown detection in dialogue breakdown detection challenge 3 for English and Japanese, although NMD(NB,PB,B) and MSE(NB,PB,B) were found appropriate specifically for English and Japanese, respectively.

Content from these authors
© The Japanese Society for Artificial Intelligence 2020
Previous article Next article