1994 Volume 21 Issue 2 Pages 21-31
The purpose of this study is to investigate the reliability of an essay test when the scores are reduced into several rating categories. The165essays by high school students were evaluated holistically and given the marks between0to100points. The marks were then classified into the2to5rating categories according to the two criteria: the percentiles and the standard deviation of each rater's original mark distribution. The reliability coefficients were calculated based on the values of information functions which were obtained through applying the graded response models. It was found that the reliability of the5-category scores(standard deviation criterion)was almost as high as the original100-point marks, which indicates validity of the categorical rating. The characteristics of some of the raters were also described using their information functions.