Article ID: 2024EDP7189
Automatic Essay Scoring is a crucial task aimed at alleviating the workload of essay graders. Most of the previous studies have been focused on English essays, primarily due to the availability of extensive scored essay datasets. Thus, it remains uncertain whether the models developed for English are applicable to smaller-scale Japanese essay datasets. Recent studies have demonstrated the successful application of BERT-based regression and ranking models. However, downloadable Japanese GPT models, which are larger than BERT, have become available, and it is unclear which types of modeling are appropriate for Japanese essay scoring. In this paper, we explore various aspects of modeling using GPTs, including the type of model (i.e., classification or regression), the size of the GPT models, and the approach to training (e.g., learning from scratch versus conducting continual pre-training). In experiments conducted with Japanese essay datasets, we demonstrate that classification models combined with soft labels are more effective for scoring Japanese essays compared to the simple classification models. Regarding the size of GPT models, we show that smaller models can produce better results depending on the model, type of prompt, and theme.