Japanese Journal for Research on Testing

Confidence Level Estimation in Neural Automatic Scoring Using Multitask Learning of Regression and Classification

Yuto Takahashi, Masaki Uto

2024Volume 20Issue 1 Pages 1-22
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_1

JOURNAL FREE ACCESS

Show abstractHide abstract

Essay or short-answer questions are commonly used in various assessments. However, especially in large-scale examinations, the substantial cost of scoring and the reduced reliability of evaluations due to rater biases can be problematic. To overcome these challenges, automatic scoring models using machine learning technologies have gained significant attention. In recent years, a variety of deep neural network-based automatic scoring models have been proposed, achieving high accuracy. However, even the most accurate current neural scoring models are still prone to potential scoring errors, posing a barrier to their adoption in high-stakes examinations. Addressing this issue, some recent studies have explored automatic scoring models that provide confidence levels alongside score predictions. This study introduces a novel neural scoring model that improves upon such conventional models, offering enhanced performance in both confidence estimation and score prediction. Experiments with actual data demonstrate that our model achieves equal or superior performance to previous models in both score prediction and confidence estimation.

View full abstract

Download PDF (1348K)

Prospects of viewpoint-based assessment for learning process and its utilization in university entrance examinations

A survey for high school teachers charged in career guidance and educational affairs

Takuya Nagano, Yuto Terashima, Haruna Tachibana, Hidetoki Ishii

2024Volume 20Issue 1 Pages 23-42
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_23

JOURNAL FREE ACCESS

Show abstractHide abstract

Based on the revised curriculum guidelines announced in March 2018, student guidance record in high school and secondary school have added columns in each subject which contain scores of viewpoint-based assessments for learning process. Students enrolling in 2022 or later are assessed with this new method, and their evaluation report is written based on these scores. Since this evaluation aims to assess students' learning process from different viewpoints and analytically comprehends their levels of achievement, these scores can be utilized in university entrance examinations where multifaceted and synthetic evaluation is expected. In this paper, we conducted a questionnaire survey for high school and secondary school teachers regarding this assessment at the end of 2022 school year, which was the first year of this assessment, and investigated possibilities to use these scores in university entrance examinations, and the achievements and problems. The results showed that teachers even in schools already employing this assessment considered this assessment had several problems to be solved and improvement should be necessary to utilize in actual university entrance examinations.

View full abstract

Download PDF (1288K)
Our experience of crises in the recent university entrance examinations

Pandemics, natural disasters, stabbing, cheating and other unforeseen risks

Takahiro Terao, Teruhisa Uchida, Hidetoki Ishii, Atsuhiro Hayashi, Hir ...

2024Volume 20Issue 1 Pages 43-71
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_43

JOURNAL FREE ACCESS

Show abstractHide abstract

This study aimed to review the policies and procedures employed in response to various crises that faced the recent university entrance examinations in Japan: COVID-19, the eruption of a submarine volcano in the southern Pacific Ocean, a stabbing outside the University of Tokyo, and cheating with technology. Such incidents revealed our unconscious assumptions and problems with the national university admissions system. We provided the basic principles and guidelines of crisis responses and obtained the implications with the aim to establish university admissions systems that are resilient in the face of unforeseen risks.

View full abstract

Download PDF (1507K)
Accuracy of Re-estimation of the National Assessment of Academic Ability (Parent Questionnaire and Long-Term Trend Survey) Using Multidimensional Item Response Theory and Plausible Values

Toshiaki Kawaguchi

2024Volume 20Issue 1 Pages 73-89
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_73

JOURNAL FREE ACCESS

Show abstractHide abstract

In the "Parent Questionnaire" and "Long-Term Trend Survey" of the National Assessment of Academic Ability conducted by the Ministry of Education, Culture, Sports, Science and Technology, there is a problem that not all data from the Parent Questionnaire can be used for analyzing. This is because the data from the Long-Term Trend Survey are sampled by subject area. To overcome this problem, a method has been proposed to re-estimate academic ability by combining data from the Parent Questionnaire, the Long-Term Trends Survey, and the National Assessment of Academic Ability using multidimensional item response theory and plausible values. In this article, I confirm the accuracy of this estimation method through a simulation study.

In conclusion, I found that this method can effectively recover population parameters such as mean and standard deviation of subpopulations, correlation coefficients between socioeconomic status (SES) and academic achievement, as well as coefficients derived from regression analyses.

View full abstract

Download PDF (1209K)
Deficiencies and Support of Thinking Process in the Use of Learning Strategy “Lesson Induction” to Promote Learning from Errors

Satomi Shiba, Yuri Uesaka

2024Volume 20Issue 1 Pages 91-109
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_91

JOURNAL FREE ACCESS

Show abstractHide abstract

A learning strategy called "Lesson Induction" facilitates learning from errors after problem-solving. However, previous studies have reported that some learners struggle to induce effective lessons that they can use for subsequent problem solving. The present study examined the deficiencies of thinking that leads to the induction of ineffective lessons and how to address them through qualitative methods. Ten eighth-grade students participated in this study. First, students used Lesson Induction after mathematical problem-solving and reported what they thought while inducing their lesson. Results of the protocol analysis of students who induced ineffective lessons suggested three types of deficiencies, such as not sufficiently comparing one's solution with the correct answer. Second, the researcher provided interactive support and asked students to induce their lesson again. The protocol analysis results suggested that learners could induce higher quality lessons by comparing one's idea of problem-solving and the idea of the correct answer and then analyzing the reason for errors. Finally, we discussed the thought process that leads to the induction of effective lessons for problem-solving.

View full abstract

Download PDF (1078K)
Assessment of the reliability and validity of psychological scale items generated by the ChatGPT

Kenichi Sasaki, Hideki Toyoda

2024Volume 20Issue 1 Pages 111-133
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_111

JOURNAL FREE ACCESS

Show abstractHide abstract

Item creation in psychology is a critical factor that significantly influences the outcome of research. The task of creating appropriate and reliable scale items requires a substantial amount of time and effort, presenting a challenge for many researchers. In recent years, with the advancement of AI, its application has expanded across various fields. However, the utilization of AI in creating psychological scale items has not yet been fully explored. This study attempts to generate psychological scale items using ChatGPT, assessing their reliability and validity based on actual response data. The results indicate that item generation using ChatGPT is promising. Furthermore, considerations for designing prompts to generate high-quality items were also discussed. This research presents new possibilities for psychological measurement methods, offering directions to enhance both efficiency and quality in scale development.

View full abstract

Download PDF (1106K)

A Selective Review on Novel Dimensionality Assessment Methods for Measurement Model Users

Kazuki Hori, Naomichi Makino

2024Volume 20Issue 1 Pages 135-167
Published: 2024
Released on J-STAGE: June 30, 2024

DOIhttps://doi.org/10.24690/jart.20.1_135

JOURNAL FREE ACCESS

Show abstractHide abstract

Determining the “correct” dimensionality is crucial in data analysis using measurement models such as factor analysis and item response theory. This is significantly important because parameter estimates could be substantially biased when dimensionality is misspecified, especially in case of underfactoring. Various methods for assessing dimensionality have been proposed; however, no golden standard has been established. Moreover, new methods continue to emerge in the literature, drawing on novel theories and techniques from research areas beyond psychological and educational measurement. This methodological proliferation poses a challenge for applied researchers to catch up on the state-of-the-art methods. The present study therefore aims to bridge this gap by offering a selective review on the novel dimensionality assessment methods, with a particular focus on (a) parallel analysis and its variants, (b) model selection approach, (c) network psychometrics approach, and (d) machine learning approach. We discuss the advantages and disadvantages of these approaches and suggest potential directions for future research.

View full abstract

Download PDF (1057K)

Register with J-STAGE for free!