Abstract
Peer assessment is currently popular as an alternative evaluation tool in the context of EFL classrooms. It is also recognized as a useful method for active and collaborative learning. There is, however, the issue of rater bias (Farrokhi, Esfandiari, & Schaefer, 2012; Matsuno, 2009), which can cause negative consequences in instances of peer assessment. One way to avoid such unfairness is to design a better measurement scale that would work equally well for inexperienced evaluators such as the participants of peer assessments and for experienced raters. This study thus aims to explore the potential of employing an assessment instrument that can help to mitigate rater bias in peer assessment. To do this, the author adopted an empirically derived, binary-choice, boundary-defined (EBB) rating scale. The investigation used the Many-Facet Rasch Measurement (MFRM) to analyze 45 sets of essay writing data scored by 5 Japanese raters (3 experienced and 2 inexperienced). The evaluation tool was employed to investigate rater severity or leniency and rater bias patterns in comparison with two types of rating scales. The results revealed that despite the utilization of the rating scales, the same patterns of rater bias occurred as were found in previous studies (Farrokhi et al., 2012; Matsuno, 2009). Moreover, the present study confirmed that the language ability of inexperienced raters could influence their rating tendency. In other words, an inexperienced rater may behave like an experienced evaluator if the person's language proficiency levels are relatively high. On the basis of the findings of the present study, this paper discusses the implications of the use of empirically developed rating scales in EFL classroom writing assessments.