2025 年 10 巻 p. 1-25
Though automatic speech recognition (ASR) can be easily used to create web-based speaking tools, there is a need to create new measures from the generated ASR transcript and evaluate how well these measures correlate to human rater scoring. This study utilized 61 speaking test audio file samples taken from a tested read-aloud task performed by Japanese EFL learners. Six human raters judge their pronunciation and fluency. ASR transcripts were obtained and transformed into a number of measures of fluency and pronunciation. Raw correlation and performance in regression models were used to evaluate the measures, and a scoring model was created to match raters’ amalgamated scores on a 1–5 scale. I found that the time to complete the task (T), the number of extra words in the ASR transcript (extraW), speech rate (SR) were meaningful measures of fluency. I also found that though penalized pronunciation score (penP) is a meaningful measure of pronunciation, fine-grained measures based on the inclusion of phrases were equally meaningful. Finally, several of these measures were able to be combined into a scoring model that showed 100% accuracy in predicting the original 61 audio files. However, it is unknown how well it will score new datasets.