2020 Volume 27 Issue 1 Pages 65-88
In our commercial chat-oriented dialogue system, we have been using an utterance database created from a massive amount of predicate-argument structures extracted from the web for generating utterances. However, because the creation of this database involves several automated processes, the database often includes non-sentences (ungrammatical or uninterpretable sentences) and utterances with inappropriate topic information (called off-focus utterances). Additionally, utterances tend to be monotonous and uninformative because they are created from single predicate-argument structures. To resolve these problems, we propose methods for filtering non-sentences by using neural network-based methods and utterances inappropriate for their associated foci by using co-occurrence statistics. To reduce monotony, we also propose a method for concatenating automatically generated utterances so that the utterances can be longer and richer in content. Experimental results indicate that the non-sentence filter can successfully remove non-sentences with an accuracy of 95% and that our focus filter can filter utterances inappropriate for their foci with high recall. We also examine the effectiveness of our filtering methods and concatenation method through an experiment involving human participants. The experimental results indicate that our methods significantly outperform a baseline in terms of understandability and that the concatenation of two utterances leads to higher familiarity and content richness while retaining understandability.