Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3F5-ES-2-02
Conference information

An Evaluation Method for Attention-based Dialog System
Khin Thet HTAR*Yanan WANGJianming WUGen HATTORIAye THIDA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Dialog systems are embedded in smartphones and Artificial Intelligence (AI) speakers and are widely used through text and speech. To achieve a human-like dialog system, one of the challenges is to have a standard automatic evaluation metric. Existing metrics like BLEU, METEOR, and ROUGE have been proposed to evaluate dialog system. However, those methods are biased and correlate very poorly with human judgements of response quality. On the other hand, RUBER is applied to not only train the relatedness between the dialog system generated reply and given query, but also measure the similarity between the ground truth and generated reply. It showed higher correlation with human judgements than BLEU and ROUGE. Based on RUBER, instead of static embedding, we explore using BERT contextualised word embedding to get a better evaluation metrics. The experiment results showed that our evaluation metrics using BERT are more correlated to human judgement than RUBER.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top