Abstract
This article describes the CallHome Japanese (CHJ) corpus and a project for annotating this corpus with various sorts of linguistic tags. The CHJ corpus is a collection of digitized speech data and text transcriptions of 120 spontaneous, unscripted telephone conversations. The annotation of the corpus provides word segmentations, part-of-speech tags and alignment with the speech for all words, semantic classes for nouns and verbs, and argument structures for verbs. A large scale, high quality corpus of naturally occurring conversations with such extensive linguistic annotations will provide a basis for scientific and technological investigation into human speech communication.