2004 Volume 11 Issue 2 Pages 67-83
Transcriptions and speech recognition results of lectures include many expressions peculiar to spoken language. Thus, it is necessary to transform them into document style for practical use of them. We apply the statistical approach used by machine translation to automatic transformation of the spoken language into document style sentences. We deal with deletion of fillers, insertion of periods, insertion of particles, conversion to written expressions and unification of the end-of-sectence style. A beam search is introduced to apply these processings in an integrated manner. Experimental evaluation using real lecture transcriptions comfirms that the statistical transformation framework works well and we achieved high recall and precision rates of period and particle insertion.