Recent improvements of speech recognition and natural language processing enable dialogue systems to deal with spontaneous speech. With the aim of supporting these systems, multi-modal man-machine interface has been introduced to the system widely. We have been aiming at realization of a robust dialogue system using spontaneous speech as main input modality. Although our conventional system was developed with a robust natural language interpreter, since its user interface was built only on speech, the system did not always give enough usability. However, in this case, response sentences became too long when they contained lots of information. And that may make user miss a part of response speech. Furthermore, user can not get any information on how a particular word in the response, e.g. a name of place, should be represented in Kanji-characters, and it was very difficult to input the position on the map through speech. These examples clearly indicate that the user interface only on speech doesn't always give enough information to the user and may cause some troubles. In order to solve these problems and realize more natural human -machine interaction, we have developed a multi-modal sightseeing guidance system with 1) speech input/output, 2) touch screen input (on map/in menu) and 3) graphical/text output (map, photograph, menu and dialogue history). Furthermore, we implemented an agent interface with real face image / animation and recorded speech / sinthesized speech to the system, and carried out evaluation experiments which consist of task completions and questionnaires to evaluate the interface and whole system. In this paper, we describe the system and the evaluation experiments.
抄録全体を表示