日本語CallHomeコーパス(<特集>音声研究関連データベースの動向)

伝 康晴; フライ ジョン

doi:10.24467/onseikenkyu.4.2_24

Abstract

This article describes the CallHome Japanese (CHJ) corpus and a project for annotating this corpus with various sorts of linguistic tags. The CHJ corpus is a collection of digitized speech data and text transcriptions of 120 spontaneous, unscripted telephone conversations. The annotation of the corpus provides word segmentations, part-of-speech tags and alignment with the speech for all words, semantic classes for nouns and verbs, and argument structures for verbs. A large scale, high quality corpus of naturally occurring conversations with such extensive linguistic annotations will provide a basis for scientific and technological investigation into human speech communication.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!