発話単位の分割または接合による言語処理単位への変換手法

竹沢 寿幸; 森元 逞

doi:10.5715/jnlp.6.2_83

Abstract

The utterance units that serve as input to speech translation and/or spoken dialogue systems that handle spontaneous speech are not always sentences. However, the processing units of language translation are sentences. Since we do not have enough knowledge about the sentences of spoken languages, we use the term “meaningful chunks” instead of sentences. First, using conventionally interpreted dialogue data, we show that utterance units sometimes need to be divided into several meaningful chunks, and sometimes need to be connected to make up a single meaningful chunk. Next, we propose a method of transforming from utterance units to meaningful chunks based on pause information and the N-gram of fine-grained part-of-speech subcategories. We have conducted experiments and have confirmed that our method yields good results.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!