Abstract
Lip reading is a regular method that enables the deaf to understand other people's speech by visual information. The acquisition of the technique, however, requires great effort and long time, and the educational system for its teaching is not established yet.
This paper describes an attempt to realize lip reading, using techniques of image processing and pattern recognition, in which we aim at clarifying possibilities and limitations inherently existing in lip reading. Although our final goal is the realization of speech understanding, this paper only deals with recognition of vowels and words of the Japanese language as the first step.
The front or side view of the mouth is taken with a TV camera, and some feature values of the lip shape were extracted. Discrimination of five vowels were performed by the maximum-likelihood method and the vowels were correctly discriminated by more than about 80% Moreover, word recognition based upon the vowel discrimination was performed. The result showed a remarkable capability of lip readig for a small number of words. Finally, several problems are discussed in relation to the actual lip reading of the deaf.