Either noises or similar word syllables deteriorate the speech recognition when the methods use only voice or sound data. Additional information applied in lip-reading may improve the recognition under such conditions.
This paper has proposed a system of speech recognition for some finite number of words under noisy background. The system, which has both an
X-Y tracker and a microphone, can detect sounds and positions of lip movement. The
X-Y tracker processes the visual images and sends a pair of the
X-Y positions in the coordinate, which enables us to obtain the lip movements in real time without some complicated tasks to the main computer. The pair of the marked position near the lip in addition to the speech are inputted to form a pattern for speech recognition. The patterns of the voice synchronized with the
X-Y positions of the lip movements are compared to the standard patterns to recognize the spoken words. The weight coefficients to change the importance of visual or voice data are introduced and selected to get the best performance for some registered words.
The experiments were carried out to prove that the proposed system can improve the recognition rate in the presence of continuous noises. The results has shown about ten percent improvement under signal noise ratio as 35dB, and about 20 percent under the signal noise ratio as 26dB.
View full abstract