Japanese Journal for Research on Testing
Online ISSN : 2433-7447
Print ISSN : 1880-9618
Volume 4, Issue 1
Displaying 1-11 of 11 articles from this issue
  • ~ A Need of Paradigm Shift for Testing Research and Practice ~
    Hiroshi Ikeda
    2008Volume 4Issue 1 Pages 3-12
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    Major Japanese testing practice has been focused primarily on relative comparison of individual differences. Testing has been widely used and produced effective outcomes for grading individual attainment in school selection and promotion in industry, assessment of aptitude in guidance, and so forth. The test scores can be utilized for ordering persons at their relative standing on the same test-takers group. However, the test scores derived from different tests or different sort of test takers are not directly compared with each other. Ordinal test scores, mostly raw or Z-scores, do not tell us the information about the amount of ability changes due to growth and learning of which we really want to know. To assess the reliable change of individuals or group performance, we need to use advanced theory and technology of test development. Necessary conditions and requirements for future Japanese testing practice are discussed referring to modem test technologies like IRT, test equating, CBT, item banking, etc.

    Download PDF (13296K)
  • Taichi Okumura
    2008Volume 4Issue 1 Pages 13-21
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    In this article, an exploratory method for determining repeated measurement design based on the posterior predictive distribution is proposed. By sampling the future observations repeatedly via the posterior predictive distribution, we can determine the measurement design necessary for the mean range of confidence interval of true scores falls within a prespecified value with a certain probability. This method takes into account uncertainty about the true parameter values, and can be carried out with introductory programming skills. This method is applicable to various situations in psychological research although it may take a long time for computation in certain conditions.

    Download PDF (9296K)
  • Eri Banno
    2008Volume 4Issue 1 Pages 23-32
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    This study investigates the potential roles of generalizability theory in investigating oral performance tests. The purposes of this study are to examine the contributions of candidates, raters, tasks, and their interactions to the variance of test scores and to find an optimal number of raters and tasks of the test using generalizability theory. Sixty-one JSL teachers evaluated six Chinese students' oral test, which consisted of the three tasks. The results of the analysis indicated that the test worked well for spreading candidates out along a continuum of oral proficiency. However, with a one-rater and one-task design, some extraneous effects on test scores that could be a source of measurement error were found, and the results indicated that in order to have higher reliability, more raters and tasks are needed for the test. As the optimal number of raters and tasks, the author suggested a two-rater and two-task design for this oral placement test. The study shows that generalizability theory is a powerful tool for investigating and developing oral performance test.

    Download PDF (10312K)
  • Teruhisa Uchida, Taketoshi Sugisawa, Kumiko Shiina
    2008Volume 4Issue 1 Pages 41-52
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    We investigated the relationship of learner's personality traits and learning styles to his or her English test scores. The 348 participants, all first-year university students, took an examination administered by the National Center for University Entrance Examinations. The English section included grammar and reading tests and listening comprehension tests. After completing the examination, each participant completed a questionnaire based on the Big Five personality scale and an inventory regarding the participant's styles of learning English. This learning style inventory consisted of three factors: improving communicative skills, inferring the meaning of unfamiliar terms on the basis of the context to grasp the main points, and focusing on vocabulary and grammar. Path analysis of personality traits, learning styles, and English test performances suggested that a learner's personality traits may affect his or her learning styles. Furthermore, the learning styles may affect his or her overall performance in English as well as the performance patterns between the grammar and reading score and the listening comprehension score.

    Download PDF (14254K)
  • Pokpong Songmuang, Maomi Ueno
    2008Volume 4Issue 1 Pages 53-64
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    The purpose of this study is to develop a practical e-testing system which is consistently designed to unify various functions of the traditional computer based testing systems. The system is consists of Item Authoring System, Item Bank, Test Delivery System, e-Testing Construction Support System, Test Database, Data Analysis System, and Adaptive Testing System. The advantage features of the integrative system are 1. The test data stored in the server is automatically divided into each function and utilized for test analysis, item analysis, test construction, and adaptive testing, and 2. The system has various functions, therefore is used for various test purposes (entrance examination, ability measurement, formative assessment, self-assessment, assessment in distance-education, e-learning, and so on). Furthermore, some evaluations from actual practices using the system by several teachers show that this system is not just a proto-type but a no-nonsense system for actual practical uses.

    Download PDF (14523K)
  • Hiroyuki Ogata, Saeko Yamamoto
    2008Volume 4Issue 1 Pages 65-72
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    Though performance testing is an effective way to assess examinees' skill in sports or manufacturing, its CBT implementation is not progressing. Taking golf putt swing as an example, this paper discusses a method to assess the skill level of an examinee automatically from his motion data. In our previous paper, we used some characteristic postures extracted from the motion data for assessment. However, this method cannot take the timing of motion or the process between the postures into account. Here, we propose using a recurrent neural network(RNN) to deal with this problem. We applied the quasi-Newton method to accelerate the leaning process, and the minimum description length principle to decide the network configuration. We verified the effectiveness of the proposed method by using actual examinees' motion data and assess their skill with RNN.

    Download PDF (8002K)
  • Investigation by meta-analysis and experiment
    Satoshi Usami
    2008Volume 4Issue 1 Pages 73-83
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    No consistent results have been shown about whether the handwriting quality affects essay test scoring. We hypothesized that the following factors may moderate the effect of handwriting quality: (1)degree of freedom for answer, (2)ages of examinees, (3)skills of scorers.

    Then, we performed a meta-analysis to evaluate the effect of these factors. The result suggested that the younger the examinee, the larger the effect of handwriting quality. Based on this result, we hypothesized that the factor of ages virtually means the factor of quality of essay, and that the quality of essay mediates the moderator effect of age. To test the hypothesis, we let 20 participants score essay tests with different levels of essay quality and handwriting quality. The result of a two-way ANOVA showed no interaction effect, which indicated that essay quality may not mediate the effect of handwriting quality.

    Download PDF (13449K)
  • Yoshikazu Sato, Eiji Muraki
    2008Volume 4Issue 1 Pages 85-100
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    One of the purposes of this paper is achieving the automatic scale adjustment of the proposal distribution in the random-walk Metropolis-Hastings algorithm and the automatic convergence detection of Markov chains. In order to realize the purpose, the parallel Markov chain Monte Carlo algorithm based on the idea suggested by Gelman, Roberts & Gilks (1996) is proposed. The remarkable feature of the proposed algorithm is that effective samples can be obtained immediately after the scale adjustment of the proposal distribution and the convergence detection of Markov chains are completed simultaneously. Another purpose of this paper is to apply the proposed algorithm to the parameter estimation of the Rasch model which is one of the item response models. Simulation results show that the item difficulties of the Rasch model can be estimated properly by the parallel single-component random-walk Metropolis-Hastings algorithm.

    Download PDF (17235K)
  • - Consideration for Two-Year and Three-Year Courses -
    Kumiko Shiina, Taketoshi Sugisawa, Ken-ichiro Komaki, Katsumi Sakurai
    2008Volume 4Issue 1 Pages 101-112
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    The pass ratio of students taking the new national bar examination (Bar Exam) for each law school was predicted based on their score on the National Admission Test for Law Schools (NATLaS). The previous model which assumes that the only student whose NATLaS score is larger than a common threshold value can pass the Bar Exam was adapted for recent situation. As for students in a two-year course of each enrolled year at each law school, a cumulative pass ratio of the Bar Exam is estimated. The previous model succeeds in predicting the estimated values. The estimated threshold values of the NATLaS score for passing the Bar Exam in the shortest years are shown to be stable between different enrolled years. For students in a three-year course, the previous model is modified by adding another threshold value of the NATLaS score. Students whose NATLaS scores are higher than the threshold value were assumed to have sufficient ability to take the Bar Exam in three years. The modified model indicates that students enrolled in a three-year course required much better NATLaS scores to succeed in passing the Bar Exam compared to those in a two-year course.

    Download PDF (15111K)
  • Tomoya Okubo, Kojiro Shojima, Tomoichi Ishizuka
    2008Volume 4Issue 1 Pages 125-133
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    In this research, test data including a multiple-choice multiple-answer item were analysed using the nominal categories model. The results revealed that response probabilities for the answer to each item are described as a function on a latent trait scale. Further, the result shows that some of the items are composed of attractive distracter choices. We have also found that the nominal categories model is useful not only for multiple-choice items but also for multiple-choice multiple-answer items. In addition to the analysis using the nominal categories model, we also obtained results from the analysis on multiple choice-multiple-answer items using a binary response model. We then compared both these results. According to the results, using the nominal categories model to analyse this type of item yielded the largest information.

    Download PDF (8778K)
  • An Approach for Evaluation with Pass-Fail Swapping Simulation
    Naoki T. Kuramoto, Dai Nishigori, Takuya Kimura, Yasuo Morita, Osamu K ...
    2008Volume 4Issue 1 Pages 135-152
    Published: 2008
    Released on J-STAGE: June 04, 2022
    JOURNAL FREE ACCESS

    In this paper we tried to make a theoretical explanation about evaluation method on score adjustment using pass-fail swapping simulations proposed by Kuramoto et al.(2008). We also conducted a case study using real admission data. We cannot apply statistical equating methods straightly to the scores obtained from achievement tests in the subject area with subject options. Score adjustment has been occasionally executed to convert raw scores in order to trim means afterwards. Score adjustment also brings about social issues. We tried to solve discrepant public opinions over past score adjustment affairs. Fairness theories in social psychology help us to understand what people feel when extremely biased scores emerge from different options. We directed our attention to the results of selection than scores in the case of individual university examinations. Swapping simulation is promising which deals with it. It is judged to be fair if the swap rates are consistent regardless of subject options. Swapping simulation was executed for evaluating score adjustment method used for optional tests in science of Tohoku University entrance examinations. The result suggested problematic consequences though they seemed successful on swap-rate indices.

    Download PDF (22460K)
feedback
Top