講演の書き起こしに対する統計的手法を用いた文体の整形

下岡 和也; 南條 浩輝; 河原 達也

doi:10.5715/jnlp.11.2_67

Abstract

Transcriptions and speech recognition results of lectures include many expressions peculiar to spoken language. Thus, it is necessary to transform them into document style for practical use of them. We apply the statistical approach used by machine translation to automatic transformation of the spoken language into document style sentences. We deal with deletion of fillers, insertion of periods, insertion of particles, conversion to written expressions and unification of the end-of-sectence style. A beam search is introduced to apply these processings in an integrated manner. Experimental evaluation using real lecture transcriptions comfirms that the statistical transformation framework works well and we achieved high recall and precision rates of period and particle insertion.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!