2024 Volume 31 Issue 2 Pages 328-348
Cloze tests play an essential role in language assessment and help language learners improve their skills. In this paper, we propose a novel task called Cloze Quality Estimation (CQE)—a task of evaluating whether a cloze test is of sufficient “high-quality” for language assessment based on two important factors: reliability and sufficiency. We have taken the first step by creating a new dataset named CELA for the CQE task, which includes English cloze tests and corresponding evaluations about their quality annotated by native English speakers, which includes 2,597 and 1,730 instances in aspects of reliability and sufficiency, respectively. We have tested baseline evaluation methods on the dataset, showing methods that only focused on the options would not perform well in the challenging task, especially in the aspect of reliability detection. More features such as context of questions are expected to improve the detection performance.