動画像のOptical Flowを用いた発声単語認識システム

中村 亮太; 赤松 茂

doi:10.11371/wiieej.08-05.0_49

Abstract

This paper describes an automatic vision-based spoken word recognition system that utilizes, instead of audio signal, visual motion signal which is obtained from motion pictures taken of a region around the mouth during speech. Motion information on each pixel in the input time-series imagery was obtained by computation of optical flow, and feature values representing a spatial configuration of pixel-wise velocities were extracted for each frame image. Both starting and ending points of time for each spoken word were defined using the velocity feature values, and a high dimensional feature vector was obtained to indicate time variation of the velocity distribution within the period of utterance. As a preliminary performance evaluation of the proposed feature in spoken word recognition, discrimination test of five spoken words including A-RI-GA-TO-U and KO-N-NI-CHI-WA was conducted, and fairly promising results were achieved.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!