単音節音声認識装置における最適パラメータの一評価法

似鳥 寧信; 伊福部 達

doi:10.20697/jasj.40.2_63

抄録

Statistical evaluation method of monosyllabic voice identification rate is developed in order to investigate the characteristics of consonant identification and to find the optimal parameters in our voice recognition system. In our system, every monosyllable is converted into a time spectral pattern with 16 components in frequency and 16 components in time after procedures of an envelope matching, an extraction of a consonant, a time smoothing and a level normalizing. Each input pattern is refered to standard patterns following the same vowel as the input by means of the minimum square distance classification. Every input pattern is represented as a matirix X which is an element of a population of each monosyllable G_i (i≤15) which is supposed to be obeyed to a multi-dimensional normal distribution, and the error rate is estimated from the probability P_b(j/i) in which X in G_i belongs to the space of G_j beyond a discriminating plane between two populations G_i and G_j. From experimental results by 30x15 monosyllables following vowel /a/ pronounced by a male speaker aged 25, characteristics of the estimated error rate coincides with that of an experimental data, our envelope matching method is proved to have almost the same effect as the shift matching, and the optimal values and method are obtained with respects to an extraction part of consonant, window length of the time smoothing and the level normalizing method by calculating the error rates in various parameters. Furthermore, it is found that the unvoiced plosives /k/, /t/ and /p/ have the different optimal parameters from the other consonants in the above evaluations, so that the different pre-processing method will be needed for the unvoiced plosives in order to make the identification rate increase.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）