1998 年 13 巻 2 号 p. 212-220
Two requirements should be met in order to develop a practical multimodal interface system, i.e., (1) integration of delayed arrival of data, and (2) elimination of ambiguity in recognition results of each modality. This paper presents an efficient and generic methodology for interpretation of multimodal input to satisfy these requirements. It is able to integrate delayed-arrival data well, and is able to efficiently interpret multimodal input that contains ambiguity by regarding the multimodal interpretation process as hypothetical reasoning and formalizing the control mechanism of interpretation on the basis of the ATMS (Assumption-based Truth Maintenance System). The proposed method is incorporated into an interface agent system that accepts multimodal input consisting of voice and direct indication gesture on a touch display. The system communicates to the user through the interface agent's 3D motion image with facial expressions, gesture, and synthesized voice.