Host: The Japanese Society for Artificial intelligence
Name : 72nd SIG-SLUD
Number : 72
Location : [in Japanese]
Date : December 15, 2014 - December 16, 2014
Pages 06-
The evaluation measures for chat-oriented dialogue systems are required in order to effectively improve such systems. Some studies have evaluated systems with several arbitrarily de ned measures; however, it is not examined whether their measures are appropriate. We analyze evaluation measures for chat-oriented dialogue systems through the semantic differential. Our analysis shows that evaluation measures are clustered into four factors for each evaluator. The factors consist of two common factors, one resemble factor between evaluators, and one personal factor. We also develop an automatic evaluation system that estimates each evaluation measure defined in the semantic differential. Our experiment shows that the developed system estimates most of the scores with the similar correlation coefficients as between human evaluators.