抄録
This paper describes a large-scale spoken language corpus of simultaneous interpreting, which has been constructed at the Center for Integrated Acoustic Information Research (CIAIR), Nagoya University. The corpus, among other things, has the following charac-teristics: (1) English and Japanese speeches are recorded in parallel, (2) the data contain monologue and dialogue speeches, and (3) the exact beginning and ending times are provided for each utterance. We have collected a total of about 65 hours of speech data and transcribed them into ASCII text files (about 367,000 morphemes in 22,000 utterance units). This paper also outlines the software tools which we have developed for the investigation of the corpus. The corpus will be made publicly available in the near future.