口唇の動きおよび色情報を用いた発話内容推定に関する検討

本田 悠将; 中村 悦郎; 景山 陽一; 廣瀬 聡

doi:10.14864/fss.37.0_499

Abstract

An automatic minute generating system based on speech recognition technology has possibility to significantly improve efficiency of speech recoding in meetings and business. Because the system prefer to apply the high-quality audio as its voice source, it is necessary to prepare the usage environment according to the number of participants before events. Moreover, the impact of surrounding voice without human voice is the main factor results in the low of speech-recognition accuracy. On the other hand, the lip motion accompanying speaking for speech can retain the speech- specific features even in a noisy environment. It means that using lip movements to estimate speech contents will contribute to the improvement of speech recognition accuracy. Our research discussed a method of speech recognition correction using image information for the purpose of improving the speech-recognition accuracy on the automatic minute generating system. In this study, we discussed a method of speech content estimation using lip movements and lip color information.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!