2026 年 30 巻 2 号 p. 348-353
In this study, the application of ChatGPT in the evaluation of pre-service principals within Taiwan’s national school leadership training program was examined. As generative AI technologies become increasingly embedded in education, understanding their role in professional assessment is essential. This research explored whether AI-generated scores align with human ratings, and whether ChatGPT can serve as a reliable tool for providing formative feedback. A total of 131 pre-service principals submitted School Improvement Plans, which were scored by both human evaluators (mentor principals and external scholars) and ChatGPT. The scoring rubric included seven dimensions of leadership competency, with both parties rating assignments on a five-point scale. Descriptive statistics, Pearson correlations, and intraclass correlation coefficients (ICCs) were used to compare scoring patterns, consistency, and reliability. Findings show that human raters consistently assigned higher and more variable scores than ChatGPT, which produced more conservative and evenly distributed ratings. Although both AI and human scores showed internal construct coherence, correlation patterns suggested that ChatGPT applied a more differentiated, data-driven evaluation logic. ICC analysis revealed moderate single-rater reliability and high average-rater reliability for both human and AI assessments. The results indicate that ChatGPT holds potential as a supplementary assessment assistant, particularly in delivering structured, rubric-based feedback. However, discrepancies in scoring tendencies and contextual interpretation underscore the need for human oversight. The findings of this study offer implications for the thoughtful integration of AI in leadership development and broader educational assessment systems.
この記事は最新の被引用情報を取得できません。