Abstract
The aim of this paper is to investigate why it is difficult to agree on writing assessment. Writing performance tests may be authentic; however rater variation can be seen as a potential weakness. For reliable assessments, sharing the rubric and rater training are essential. Tanaka et al. (2009) developed a multiple-trait scoring rubric for academic writing in Japanese as a second language, and held a workshop for experienced teachers. Eight teachers who participated in this workshop were later asked to evaluate two types of essays; evaluating 26 essays of each type. The results showed a high coefficient of reliability in the assessment, while discrepancies among ratings were observed in a few essays. In order to investigate the discrepancies, questionnaires were administered and a moderation meeting was held. By examining the results of the assessments, the questionnaires and the moderation meeting, some elements which might have caused difficulties in agreement on writing assessment were found: traits, prompts, level, writing capability, and individual raters' differences. This study confirmed the authenticity of writing performance tests and the difficulties in obtaining consistency among raters on writing assessment. The difficulties, however, may be solved by improving the rubric, analyzing assessment processes, and utilizing moderation meetings.