Abstract
This report is a follow-up to a study by Shizuka (1999a; 1999b), in which a new scheme for classifying reading test items was proposed and tried out. In this scheme, an item was to be placed in one of nine categories based on its position on two dimensions, one concerning the size of the relevant text portion and the other relating to the depth of cognitive processing required to respond correctly to the item. In the trial phase, four testing experts were asked to apply the taxonomy to Cambridge KET, PET and FCE. While acceptable agreement rate was observed for the two lower-level tests (KET & PET), rather low inter-classifier reliability was revealed for FCE. The present paper focuses on detailed qualitative and introspective examination of the FCE items over which classifiers' perception differed to a considerable extent in Shizuka's (1999a; 199b) trial. The analyses revealed several different ways language testers view specific items, highlighting limitations of the adopted scheme.