Statistical evaluation method of monosyllabic voice identification rate is developed in order to investigate the characteristics of consonant identification and to find the optimal parameters in our voice recognition system. In our system, every monosyllable is converted into a time spectral pattern with 16 components in frequency and 16 components in time after procedures of an envelope matching, an extraction of a consonant, a time smoothing and a level normalizing. Each input pattern is refered to standard patterns following the same vowel as the input by means of the minimum square distance classification. Every input pattern is represented as a matirix X which is an element of a population of each monosyllable G_i (i≤15) which is supposed to be obeyed to a multi-dimensional normal distribution, and the error rate is estimated from the probability P_b(j/i) in which X in G_i belongs to the space of G_j beyond a discriminating plane between two populations G_i and G_j. From experimental results by 30x15 monosyllables following vowel /a/ pronounced by a male speaker aged 25, characteristics of the estimated error rate coincides with that of an experimental data, our envelope matching method is proved to have almost the same effect as the shift matching, and the optimal values and method are obtained with respects to an extraction part of consonant, window length of the time smoothing and the level normalizing method by calculating the error rates in various parameters. Furthermore, it is found that the unvoiced plosives /k/, /t/ and /p/ have the different optimal parameters from the other consonants in the above evaluations, so that the different pre-processing method will be needed for the unvoiced plosives in order to make the identification rate increase.
View full abstract