人工知能学会研究会資料 言語・音声理解と対話処理研究会
Online ISSN : 2436-4576
Print ISSN : 0918-5682
71回 (2014/9)
会議情報

大規模マルチリファレンスに基づく雑談対話システムの自動評価に向けた実験的検討
杉山 弘晃目黒 豊美東中 竜一郎
著者情報
会議録・要旨集 フリー

p. 01-

詳細
抄録

The evaluation of conversational systems that chat with people remains an open-problem. Some studies have evaluated them by hand with ordinal scales like the Likert scale. One limitation with this approach is that we cannot use the previously evaluated values since the ordinal scales are not consistent across all of the evaluations. This makes it difficult to compare proposed and previous systems since we have to implement the previous systems and simultaneously evaluate them. We propose an automatic evaluation method for conversational systems that evaluates the sentences generated by systems on the basis of the similarities that are calculated with many reference sentences and their annotated evaluation values. Our proposed method's correlation coefficient with humans reached 0.514, and that of the human annotators was 0.783. Although there remains a gap between the estimated and the human-annotated values, the proposed method outperforms a baseline method that uses the BLEU scores as the evaluation values. We also show that we can gain a correlation coefficient of 0.499 with evaluating just 7% of all the data.

著者関連情報
© 2014 人工知能学会
次の記事
feedback
Top