抄録
We have been developing the speech visualization system as a speech training aid for the hearing impaired. This research has been carried out to build up a recognition system by which it is possible to represent a phoneme sequence on a display screen together with the visualized speech. This kind of system requires to recognize 20-50 isolated words which are spoken by anyone. Moreover, it is desirable that the vocabulary (20-50 words) of the dictionary can freely and easily be constructed or changed. Although the system which has phoneme standard patterns independently of the word dictionary is reasonable to achieve the aim, generally it will not always show a high recognition rate to a large vocabulary. In order to realize the high performance system which satisfies the above requirement, new speech parameters which are mutually complementary for the recognition have been adopted. A local distance between an input frame and each of 32 phoneme categories has been defined as a weighted sum of distances of the parameters because their dimensions are not the same. The weights have been decided to maximize phoneme recognition rates and tested for the word recognition. As a result, the addition of parameters was very effective and the recognition rates by the proposed parameters were much higher than by usually used one. When 50 words uttered by 30 males from two different databases were tested, the recognition rates were 97.2% and 98.3% in Bayes-decision-rule based distance.