Abstract
This study explored how Japanese teachers evaluate essays written by Chinese learners of Japanese on the basis of generalizability theory, which is a statistical framework used for investigating reliable observations. The essays used in the present study were written by 16 learners (8 intermediate level learners and 8 advanced level learners). Two groups of raters evaluated the essays: 6 raters who have taught Japanese writing for more than five years and 7 raters who belong to graduate school and major in Japanese language teaching. The raters evaluated Japanese learners’ essays using a rubric developed by Tanaka and Nagasaka (2006), which is composed of five multiple traits (i.e., content, organization, reader-friendliness, accuracy and appropriateness). The results suggested that teachers with more than five years of teaching experience do not necessarily evaluate learners’ essays with high reliability. By employing simulated study, we further examined how the reliability would change depending on the number of raters and evaluation items. On the basis of the results, a note of caution for research on essay evaluation was sounded and pedagogical implications were discussed.