1999 Volume 14 Issue 3 Pages 455-465
A system that interactively explains a machine and its installation by utilizing linguistic information and visual information was built. This system has two frameworks in which interactive multimodal explanation can be generated. The first framework uses formatted text and pictures. It is interactive as it allows the users to make follow-up questions and replies using spoken language with pointing gestures. The second framework uses spoken explanation coordinated with pointing gestures and animation. That explanation is a reproduction of instruction dialogues made by experts in a face-to-face situation. This paper describes the mechanisms for explanation generation in this interactive multimodal explanation system. The mechanisms needed for achieving the interactive features of explanation such as accepting follow-up questions, and the way of handling the temporality of explanation caused by the use of temporal media such as spoken language are explained in detail. In addition, through the comparison of the two frameworks of multimodal explanation and their generation mechanisms, an appropriate organization of processing modules and knowledge sources for multimodal explanation generation is proposed. The description of the object explained and means available for explanation such as an utterance realizer can and should be shared between frameworks under the consideration that they are used in different ways. Explanation generation mechanisms themselves, however, should be designed specifically for each framework. The difficulties of sharing those components come from two facts. First, there is divergence in the rhetorical devices that can be utilized and the factors that need to be considered in making explanations easy to understand. Second, the way of making explanation interactive differs in principle in each explanation framework.