Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Enhancing Automated Essay Scoring with Grammatical Features using Multi-task Learning and Item Response Theory
Kosuke DoiKatsuhito SudohSatoshi NakamuraTaro Watanabe
Author information
JOURNAL FREE ACCESS

2025 Volume 32 Issue 2 Pages 438-479

Details
Abstract

In foreign language learning, writing tasks play a crucial role in developing and assessing learners’ language abilities, but manual scoring requires significant time and effort. Automated essay scoring (AES) is a way to mitigate this problem. Although human raters consider grammatical items and their difficulties as clues for judging learners’ proficiency levels while scoring essays, it is unclear whether the current state-of-the-art AES models, which use BERT-based essay representations, consider these factors. In this paper, we propose to incorporate grammatical features into BERT-based AES models in three ways: (1) using grammatical features as additional model inputs, (2) performing multi-task learning (MTL) with holistic and grammar scores while using grammatical features as model inputs, and (3) reconstructing grammatical features through MTL with holistic scores. For grammatical features, we model learners’ grammar usage using item response theory (IRT), which measures learners’ grammar abilities and characteristics of grammatical items, including their difficulties, based on essay data without teacher labels. The experimental results show that grammatical features improve the scoring performance, and further improvements are brought by MTL with holistic and grammar scores. We also show that weighting grammatical items using IRT-estimated difficulties improve the scoring performance, and IRT-estimated grammar abilities can be used for the labels of MTL.

Content from these authors
© 2025 The Association for Natural Language Processing
Previous article Next article
feedback
Top