The purpose of this paper is to investigate score changes for test re-takers in University Entrance Examination system. This study uses the score data of the 1964 test takers of the Educational Test Research Institute who took the 1965 test. The test of the Educational Test Research Institute (NOKEN Test) is conducted from 1963 to 1968.
As results of these score analysis, the order that is gained by the following test score (in senior [12 grade]) is not different with the order that is gained by the previous test score (in junior [11 grade]). And score change between the previous test and the following test can be interpreted as regression effect. Furthermore, when the total score that added the following test score to the previous test one, the value of multiple correlation coefficients adjusted for the degrees of freedom is declined in multiple regression analysis.
This is why caring out examinations twice and more during same year and using test score in previous another year is not always insisted to be significant in University Entrance Examination system in Japan, at least in NOKEN test era.
In this study, we analyzed an English language entrance test of Kansai University administered in the past to validate its structural aspect. We also investigated the extent to which each section of the test had an influence on the overall test score. By applying the multidimensional item response theory (MIRT), we found that a bifactor model with a general factor and lower-level factors reflecting the differences of sections and reading materials fit the data better than other models. In addition, it was confirmed that a certain section in the test effectively performed to distinguish the ability of test takers. These results suggest that the test retains desirable traits as a measurement instrument used for entrance examination purposes in terms of the structural aspect of validity and assessment of test takers’ ability. At the same time, this study points to the importance of applying language testing theory in the process of creating, administering, and analyzing a high-stakes test. Implications of employing test theories are discussed in view of the current situation of entrance examination administration in Japan.
The more complicated and diverse the psychological constructs a test measures, the more sophisticated methods to study the constructs are required. Recently, multidimensional IRT is used in such studies so that the existence and structure of subscales of achievement tests are examined. The present study used the data (which had been implemented for all students in Niigata prefecture at the eighth grade level (N=9,102)) and examined in detail what the subscales measure using multidimensional IRT (bi-factor model). The results show that group factors have more impact than general factor on the 2 out of 25 items, and qualitative examination of these items confirmed the result. It is implied that multidimensional IRT can provide much item-level information compared to conventional (unidimensional) IRT and have potential for better examination of measurement constructs.
For free-response questions (longer answer non-multiple choice questions) in mathematics, as long as logical reasoning leading to the correct final answer is indicated, independent of the particular choice of problem-solving strategy, the same evaluation (scores) is assigned. However, we all know, in the course of grading exams, that there are responses worthy of certain evaluation because of their mathematical thinking, decision making skills, and problem-solving skills, etc., even if the final solution is not correctly reached. In this research, to search for means of evaluation different from the usual “scoring”, focusing on problem-solving strategies, we reanalyzed students' answers in free-response questions by classifying them by their problem-solving strategies. As a result, we observed a stronger correlation between the choice of problem-solving strategy and overall academic performance in mathematics (e.g. scores on the mathematics section of the National Center for University Entrance Examination) than between the correct answer ratio of a particular question and overall academic performance in mathematics.
We investigated choice ratios of attractors that are the incorrect choices reflective of typical errors made by students during reading comprehension tests by examining the effects of lexical overlaps between key sentences related to each choice and an attractor. We also investigated the effects of question types represented by lower- and upper-level questions on attractor decisions. The former questions asked test takers only to identify key sentences and evaluate options, whereas the latter asked them to grasp the structure or the gist of a paragraph or a passage. Undergraduate student test takers (N = 460) participated. They were given one of eight booklets. Experimental items consisted of one key and three attractors: a negation, an antonym, and a causal misunderstanding. Estimates and generated quantities in a Bayesian hierarchical model obtained via Gibbs sampling indicated that for lower-level questions test takers with low proficiency selected attractors with overlapping words, whereas those with high proficiency chose attractors with negations or antonyms in the non-overlapping condition. In contrast, for upper-level questions less proficient students chose attractors in the non-overlapping condition and proficient students selected attractors with negations or antonyms in the overlapping condition. These results suggest that examining attractors in multiple-choice tests could enable us to develop optimal items and to qualify test items.
Multiple correspondence analysis is applied to binary data encoded from examinees’responses to test items, which compose prototypes of tests with the intention of measuring university applicants’basic academic abilities of two areas:“Practical Reading”and“Mathematical Thinking”. The abilities to be measured in each area are further classified into a lower level and each test item has classification labels attached in advance. A three-dimensional solution is obtained and compared to the labels of the classified abilities attached to each item. The first dimension is interpreted as general basic academic abilities. The second dimension divides the items that require the handling of numbers and formulas or understanding of relatively simple rules, and the items that require the handling of logical problems. The third dimension expresses the necessity of higher-order thinking. The interpretation supports the validity of the prototype, although the cumulative percentage of eigenvalues up to the third dimension is low and the efficiency of dimensional reduction is moderate. The test quality could be improved by the development of more difficult items and/or collecting data of examinees with a wide range of abilities, so that more items contribute to differentiate the abilities of collected examinees.
The multiple-choice (MC) format is the most widely used format in objective testing. The “select all the choices that are true” item is one variation of the MC format. This item has no instructions to indicate the number of correct choices. Although many studies have developed and compared scoring methods for this type of item, the results have often been inconsistent.
Most scoring methods that have been developed are based on the number of choices correctly selected. In this study, we treated the response patterns of examinees as binary variables and we proposed new scoring methods based on the similarity or the degree of association between response patterns and key patterns. Two proposed methods, the multiple true-false (MTF) method, and the negative marking (NM) method were compared and their characteristics were revealed. Among these methods, the Jaccard index method was considered to be appropriate from the viewpoint of score diversity and calculation simplicity. The results showed that the response patterns with high scores were basically identical for the methods.
The raw scores obtained from the rating-scale method reflect not only the construct of interest in the test but also the response styles of the respondents. The method of anchoring vignettes was developed in order to distinguish between the two. A method for anchoring vignettes data based on the multidimensional item response theory (MIRT) was proposed recently; it has an advantage because it is based on the well-established modern test theory. The current study extends this framework in the following manner. First, an improved statistical model selection is introduced, based on the Watanabe-Akaike information criteria and leave-one-out cross-validation using pareto-smoothed importance sampling. Second, the Hamiltonian Monte Carlo estimation algorithm, which has a numerical advantage in complex models, such as the current one, is introduced. Third, two empirical datasets are comparatively analyzed using the proposed method. The results consistently indicate the utility of the bias-correction based on the anchoring vignettes and MIRT model. The study also discusses the importance of correcting the raw scores and usefulness of the MIRT-based model.
Web-based online exams are now becoming popular, driven by the growing number of computers connected to the Internet. The major benefit of online exams is that they can improve efficiency and save costs by reducing the workload of examiners, and examinees can access them anytime and from anywhere. This paper proposes a method for detecting cheating, by examinees using acoustic devices like in-ear earphones, in unproctored online exams, such as computerized take-at-home exams. Such technologies are expected to raise the reliability of assessment, thus improving fairness in online exams. We employ an eye tracker to track the gaze features of the examinees to detect cheating using acoustic devices. In the experiment, subjects performed both a reading task and a dual task of pretending to read a dummy text while carrying out a listening task. The gaze behaviors of the subjects were compared to verify the effectiveness of the proposed method.