2024 年 27 巻 p. 3-24
The main purpose of this research is to examine how far we can depend on the scores (Scoring validity) made by an AI-powered essay-scoring system for a task-based writing test (TBWT). It contains two elicitation tasks: Task 1 focusing on Accuracy and Task 2 focusing on Communicability. Japanese high school students participated in the present study. They took the TBWT online and answered the survey of opinions about their grades set by the system. In order to consider the scoring validity, two-way ANOVA for mixed design was conducted on their scores of TBWT. The results indicated that 1) the five groups created based on the grades are significantly different in both Accuracy and Communicability grades, 2) the cut-off scores of levels for different words in a text (type levels) should be adjusted for Accuracy grades, 3) the cut-off scores of type levels and quality of ideas need to be adjusted for Communicability grades. However, the survey results show that most of the students exhibited consensual agreement in their grades, with A, B+, B, B-and C according to the cut-off score set for each grade. These findings were discussed from the point of view of further improvement of the system.