2015 年 19 巻 1 号 p. 44-52
This paper introduces an automatic Mandarin pronunciation evaluation method, which aims at building a computer-based system to partly replace human examiners in the Putonghua Shuiping Ceshi (PSC) in China. This method learns the mapping relationship between the recorded speech waveforms and the score of pronunciation proficiency by a statistical modeling approach, which is composed of three main modules: the frontend module, the evaluation feature extraction module and the mapping module. In the frontend module, hidden Markov model (HMM)-based acoustic models are constructed to describe the distribution of acoustic features for standard pronunciation. In the evaluation feature extraction module, posterior probabilities are calculated for segmental and tonal acoustic features of speech data from each examinee using the trained acoustic models. These posterior probabilities together with a duration feature compose the feature vector for predicting pronunciation scores. Finally, in the mapping module, piecewise linear regression is introduced to map the evaluation feature vector into a pronunciation score for each examinee. The piecewise linear regression is achieved by cascading an SVM classifier and a linear regression for each class in our implementation. An experiment on evaluating the real PSC test data of 5,420 speakers shows that the system constructed using our proposed method achieved a correlation of 0.901 between the predicted scores and the scores given by human examiners for the first three sections of PSC test. Another experiment which compared the performance of our system with 20 human examiners shows that our system ranked 2nd and outperformed most of the human examiners in terms of evaluation accuracy.