Rater Reliability in Classroom Speaking Assessment in a Japanese Senior High School

Rie KOIZUMI; Akiyo WATANABE

doi:10.20581/arele.32.0_129

Abstract

　　When teachers score classroom speaking tests, intensive rater training ahead of the test may not always be possible. The current study examines the extent to which rater reliability can be maintained using a simple rubric without detailed rater training. We analyzed four speaking tests for senior high school students (N = 116). The speaking tests involved an individual presentation, a paired role play, and two group discussions across seven months. Each test was evaluated using a simple rubric by two or more raters who did not receive intensive rater training. The data was analyzed using many-facet Rasch measurement and generalizability theory. The results suggest that in general, raters scored similarly and consistently. The number of raters required to maintain sufficient reliability (Φ = .70), at the overall test level, was one to four, with group discussion tests requiring more raters or intensive rater training. Pedagogical implications with regard to the allocation of limited resources of time and raters were discussed.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!