2025 Volume 32 Issue 2 Pages 438-479
In foreign language learning, writing tasks play a crucial role in developing and assessing learners’ language abilities, but manual scoring requires significant time and effort. Automated essay scoring (AES) is a way to mitigate this problem. Although human raters consider grammatical items and their difficulties as clues for judging learners’ proficiency levels while scoring essays, it is unclear whether the current state-of-the-art AES models, which use BERT-based essay representations, consider these factors. In this paper, we propose to incorporate grammatical features into BERT-based AES models in three ways: (1) using grammatical features as additional model inputs, (2) performing multi-task learning (MTL) with holistic and grammar scores while using grammatical features as model inputs, and (3) reconstructing grammatical features through MTL with holistic scores. For grammatical features, we model learners’ grammar usage using item response theory (IRT), which measures learners’ grammar abilities and characteristics of grammatical items, including their difficulties, based on essay data without teacher labels. The experimental results show that grammatical features improve the scoring performance, and further improvements are brought by MTL with holistic and grammar scores. We also show that weighting grammatical items using IRT-estimated difficulties improve the scoring performance, and IRT-estimated grammar abilities can be used for the labels of MTL.