Japanese Journal for Research on Testing

From the Measurement between Individual Comparisons To the Measurement within Individual Changes

～ A Need of Paradigm Shift for Testing Research and Practice ～

Hiroshi Ikeda

2008Volume 4Issue 1 Pages 3-12
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_3

JOURNAL FREE ACCESS

Show abstractHide abstract

Major Japanese testing practice has been focused primarily on relative comparison of individual differences. Testing has been widely used and produced effective outcomes for grading individual attainment in school selection and promotion in industry, assessment of aptitude in guidance, and so forth. The test scores can be utilized for ordering persons at their relative standing on the same test-takers group. However, the test scores derived from different tests or different sort of test takers are not directly compared with each other. Ordinal test scores, mostly raw or Z-scores, do not tell us the information about the amount of ability changes due to growth and learning of which we really want to know. To assess the reliable change of individuals or group performance, we need to use advanced theory and technology of test development. Necessary conditions and requirements for future Japanese testing practice are discussed referring to modem test technologies like IRT, test equating, CBT, item banking, etc.

View full abstract

Download PDF (13296K)
An exploratory method for determining measurement design based on the posterior predictive distribution

Taichi Okumura

2008Volume 4Issue 1 Pages 13-21
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_13

JOURNAL FREE ACCESS

Show abstractHide abstract

In this article, an exploratory method for determining repeated measurement design based on the posterior predictive distribution is proposed. By sampling the future observations repeatedly via the posterior predictive distribution, we can determine the measurement design necessary for the mean range of confidence interval of true scores falls within a prespecified value with a certain probability. This method takes into account uncertainty about the true parameter values, and can be carried out with introductory programming skills. This method is applicable to various situations in psychological research although it may take a long time for computation in certain conditions.

View full abstract

Download PDF (9296K)
An lnvestigation of an Oral Placement Test of Japanese Language Using Generalizability Theory

Eri Banno

2008Volume 4Issue 1 Pages 23-32
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_23

JOURNAL FREE ACCESS

Show abstractHide abstract

This study investigates the potential roles of generalizability theory in investigating oral performance tests. The purposes of this study are to examine the contributions of candidates, raters, tasks, and their interactions to the variance of test scores and to find an optimal number of raters and tasks of the test using generalizability theory. Sixty-one JSL teachers evaluated six Chinese students' oral test, which consisted of the three tasks. The results of the analysis indicated that the test worked well for spreading candidates out along a continuum of oral proficiency. However, with a one-rater and one-task design, some extraneous effects on test scores that could be a source of measurement error were found, and the results indicated that in order to have higher reliability, more raters and tasks are needed for the test. As the optimal number of raters and tasks, the author suggested a two-rater and two-task design for this oral placement test. The study shows that generalizability theory is a powerful tool for investigating and developing oral performance test.

View full abstract

Download PDF (10312K)
Relationship of learner's personality traits and learning styles to English reading and listening test scores

Teruhisa Uchida, Taketoshi Sugisawa, Kumiko Shiina

2008Volume 4Issue 1 Pages 41-52
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_41

JOURNAL FREE ACCESS

Show abstractHide abstract

We investigated the relationship of learner's personality traits and learning styles to his or her English test scores. The 348 participants, all first-year university students, took an examination administered by the National Center for University Entrance Examinations. The English section included grammar and reading tests and listening comprehension tests. After completing the examination, each participant completed a questionnaire based on the Big Five personality scale and an inventory regarding the participant's styles of learning English. This learning style inventory consisted of three factors: improving communicative skills, inferring the meaning of unfamiliar terms on the basis of the context to grasp the main points, and focusing on vocabulary and grammar. Path analysis of personality traits, learning styles, and English test performances suggested that a learner's personality traits may affect his or her learning styles. Furthermore, the learning styles may affect his or her overall performance in English as well as the performance patterns between the grammar and reading score and the listening comprehension score.

View full abstract

Download PDF (14254K)
Development and Practice of an Integrative e-Testing System

Pokpong Songmuang, Maomi Ueno

2008Volume 4Issue 1 Pages 53-64
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_53

JOURNAL FREE ACCESS

Show abstractHide abstract

The purpose of this study is to develop a practical e-testing system which is consistently designed to unify various functions of the traditional computer based testing systems. The system is consists of Item Authoring System, Item Bank, Test Delivery System, e-Testing Construction Support System, Test Database, Data Analysis System, and Adaptive Testing System. The advantage features of the integrative system are 1. The test data stored in the server is automatically divided into each function and utilized for test analysis, item analysis, test construction, and adaptive testing, and 2. The system has various functions, therefore is used for various test purposes (entrance examination, ability measurement, formative assessment, self-assessment, assessment in distance-education, e-learning, and so on). Furthermore, some evaluations from actual practices using the system by several teachers show that this system is not just a proto-type but a no-nonsense system for actual practical uses.

View full abstract

Download PDF (14523K)
A Method to Estimate Examinee's Skill from Time-Series Motion Data

Hiroyuki Ogata, Saeko Yamamoto

2008Volume 4Issue 1 Pages 65-72
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_65

JOURNAL FREE ACCESS

Show abstractHide abstract

Though performance testing is an effective way to assess examinees' skill in sports or manufacturing, its CBT implementation is not progressing. Taking golf putt swing as an example, this paper discusses a method to assess the skill level of an examinee automatically from his motion data. In our previous paper, we used some characteristic postures extracted from the motion data for assessment. However, this method cannot take the timing of motion or the process between the postures into account. Here, we propose using a recurrent neural network(RNN) to deal with this problem. We applied the quasi-Newton method to accelerate the leaning process, and the minimum description length principle to decide the network configuration. We verified the effectiveness of the proposed method by using actual examinees' motion data and assess their skill with RNN.

View full abstract

Download PDF (8002K)
What factor moderates the effect of handwriting quality on essay test scoring

Investigation by meta-analysis and experiment

Satoshi Usami

2008Volume 4Issue 1 Pages 73-83
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_73

JOURNAL FREE ACCESS

Show abstractHide abstract

No consistent results have been shown about whether the handwriting quality affects essay test scoring. We hypothesized that the following factors may moderate the effect of handwriting quality: (1)degree of freedom for answer, (2)ages of examinees, (3)skills of scorers.

Then, we performed a meta-analysis to evaluate the effect of these factors. The result suggested that the younger the examinee, the larger the effect of handwriting quality. Based on this result, we hypothesized that the factor of ages virtually means the factor of quality of essay, and that the quality of essay mediates the moderator effect of age. To test the hypothesis, we let 20 participants score essay tests with different levels of essay quality and handwriting quality. The result of a two-way ANOVA showed no interaction effect, which indicated that essay quality may not mediate the effect of handwriting quality.

View full abstract

Download PDF (13449K)
An attempt of parameter estimation for the Rasch model by parallel Markov chain Monte Carlo

Yoshikazu Sato, Eiji Muraki

2008Volume 4Issue 1 Pages 85-100
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_85

JOURNAL FREE ACCESS

Show abstractHide abstract

One of the purposes of this paper is achieving the automatic scale adjustment of the proposal distribution in the random-walk Metropolis-Hastings algorithm and the automatic convergence detection of Markov chains. In order to realize the purpose, the parallel Markov chain Monte Carlo algorithm based on the idea suggested by Gelman, Roberts & Gilks (1996) is proposed. The remarkable feature of the proposed algorithm is that effective samples can be obtained immediately after the scale adjustment of the proposal distribution and the convergence detection of Markov chains are completed simultaneously. Another purpose of this paper is to apply the proposed algorithm to the parameter estimation of the Rasch model which is one of the item response models. Simulation results show that the item difficulties of the Rasch model can be estimated properly by the parallel single-component random-walk Metropolis-Hastings algorithm.

View full abstract

Download PDF (17235K)
Prediction of Pass Ratio of Students Taking New National Bar Examination for Each Law School based on National Admission Test for Law Schools

- Consideration for Two-Year and Three-Year Courses -

Kumiko Shiina, Taketoshi Sugisawa, Ken-ichiro Komaki, Katsumi Sakurai

2008Volume 4Issue 1 Pages 101-112
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_101

JOURNAL FREE ACCESS

Show abstractHide abstract

The pass ratio of students taking the new national bar examination (Bar Exam) for each law school was predicted based on their score on the National Admission Test for Law Schools (NATLaS). The previous model which assumes that the only student whose NATLaS score is larger than a common threshold value can pass the Bar Exam was adapted for recent situation. As for students in a two-year course of each enrolled year at each law school, a cumulative pass ratio of the Bar Exam is estimated. The previous model succeeds in predicting the estimated values. The estimated threshold values of the NATLaS score for passing the Bar Exam in the shortest years are shown to be stable between different enrolled years. For students in a three-year course, the previous model is modified by adding another threshold value of the NATLaS score. Students whose NATLaS scores are higher than the threshold value were assumed to have sufficient ability to take the Bar Exam in three years. The modified model indicates that students enrolled in a three-year course required much better NATLaS scores to succeed in passing the Bar Exam compared to those in a two-year course.

View full abstract

Download PDF (15111K)
An analysis of test data including a multiple-choice multiple-answer item using the nominal categories model

Tomoya Okubo, Kojiro Shojima, Tomoichi Ishizuka

2008Volume 4Issue 1 Pages 125-133
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_125

JOURNAL FREE ACCESS

Show abstractHide abstract

In this research, test data including a multiple-choice multiple-answer item were analysed using the nominal categories model. The results revealed that response probabilities for the answer to each item are described as a function on a latent trait scale. Further, the result shows that some of the items are composed of attractive distracter choices. We have also found that the nominal categories model is useful not only for multiple-choice items but also for multiple-choice multiple-answer items. In addition to the analysis using the nominal categories model, we also obtained results from the analysis on multiple choice-multiple-answer items using a binary response model. We then compared both these results. According to the results, using the nominal categories model to analyse this type of item yielded the largest information.

View full abstract

Download PDF (8778K)
On Effectiveness and Limitation of Score Adjustment for Selective Testings

An Approach for Evaluation with Pass-Fail Swapping Simulation

Naoki T. Kuramoto, Dai Nishigori, Takuya Kimura, Yasuo Morita, Osamu K ...

2008Volume 4Issue 1 Pages 135-152
Published: 2008
Released on J-STAGE: June 04, 2022

DOIhttps://doi.org/10.24690/jart.4.1_135

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we tried to make a theoretical explanation about evaluation method on score adjustment using pass-fail swapping simulations proposed by Kuramoto et al.(2008). We also conducted a case study using real admission data. We cannot apply statistical equating methods straightly to the scores obtained from achievement tests in the subject area with subject options. Score adjustment has been occasionally executed to convert raw scores in order to trim means afterwards. Score adjustment also brings about social issues. We tried to solve discrepant public opinions over past score adjustment affairs. Fairness theories in social psychology help us to understand what people feel when extremely biased scores emerge from different options. We directed our attention to the results of selection than scores in the case of individual university examinations. Swapping simulation is promising which deals with it. It is judged to be fair if the swap rates are consistent regardless of subject options. Swapping simulation was executed for evaluating score adjustment method used for optional tests in science of Tohoku University entrance examinations. The result suggested problematic consequences though they seemed successful on swap-rate indices.

View full abstract

Download PDF (22460K)

Register with J-STAGE for free!