JSAI Technical Report, SIG-SLUD
Online ISSN : 2436-4576
Print ISSN : 0918-5682
71st (Sep, 2014)
Conference information

Experimental Analysis for Automatic Evaluation of Open-domain Conversational Systems based on Large-scale Multi-references
Hiroaki SUGIYAMAToyomi MEGURORyuichiro HIGASHINAKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 01-

Details
Abstract

The evaluation of conversational systems that chat with people remains an open-problem. Some studies have evaluated them by hand with ordinal scales like the Likert scale. One limitation with this approach is that we cannot use the previously evaluated values since the ordinal scales are not consistent across all of the evaluations. This makes it difficult to compare proposed and previous systems since we have to implement the previous systems and simultaneously evaluate them. We propose an automatic evaluation method for conversational systems that evaluates the sentences generated by systems on the basis of the similarities that are calculated with many reference sentences and their annotated evaluation values. Our proposed method's correlation coefficient with humans reached 0.514, and that of the human annotators was 0.783. Although there remains a gap between the estimated and the human-annotated values, the proposed method outperforms a baseline method that uses the BLEU scores as the evaluation values. We also show that we can gain a correlation coefficient of 0.499 with evaluating just 7% of all the data.

Content from these authors
© 2014 The Japaense Society for Artificial Intelligence
Next article
feedback
Top