Abstract
Compilation of a large-scale corpus of spontaneous Japanese monologue is underway as a joint work of the National Language Research Institute (under the Agency of Cultural Affairs) and the Communications Research Laboratory (under Ministry of Post and Telecommunication). The corpus will contain about 700 hours of digitized speech (about 7 million morphemes), its transcription, and various tagging information such as POS information. Phonological labels (segmental as well as prosodic) will be provided for a subset of the corpus. The corpus will become publicly available in the spring of 2004.