Abstract
In this paper, we describe a corpus that contains spontaneous speech data between human and machine using the WOZ technique. It is designed to collect data in order to analyze the elements that enable interaction between human and machine, such as taking turns naturally during the dialog, chiming in, interruption, and natural recovery from interruption. The corpus contains the data from forty speakers during 197 sessions. The net amount of time that was spent on dialog sessions was over 1, 300 min. The corpus contains speech signal waves, pitch patterns, transcriptions, utterance segment boundaries, and semantic representations of the user utterance.