Host: The Japanese Society for Artificial intelligence
Name : 73rd SIG-SLUD
Number : 73
Location : [in Japanese]
Date : March 09, 2015
Pages 02-
A spoken dialogue system should respond quickly after a user finishes speaking, but this often causes incorrect segmentation of user utterances by erroneous voice activity detection. We previously developed a method that performs a posteriori restoration for the incorrectly segmented utterances. A crucial part of the method is to classify whether the restoration is required or not. In this paper, we improve the accuracy by adapting the classification to each user. We focus on speaking tempo of each user, which can be obtained during dialogues. We reveal a correlation between each user's tempos and their appropriate thresholds used in the classification. We then derive a linear regression function that converts the tempos into the thresholds. We adapt two classifiers: that simply using a threshold and decision tree learning. Experimental results showed the proposed user adaptation for the two classifiers improved the classification accuracies by 3.3% and 2.1%.