2023 Volume 30 Issue 2 Pages 330-371
This study constructed an error-tagged evaluation corpus for Japanese grammatical error correction (GEC). Evaluation corpora are essential for assessing the performance of models. The availability of various evaluation corpora for English GEC has facilitated a comprehensive comparison between models and the development of the English GEC community. However, the development of the Japanese GEC community has been hindered due to the lack of available evaluation corpora in the Japanese GEC. As a result, we constructed a new evaluation corpus for the Japanese GEC and made it available to the public. We used texts written by the Japanese language learners in the Lang-8 corpus, a representative learner corpus in GEC, to create the evaluation corpus. The specification of the evaluation corpus was modified to align with the representative corpora and tools in the English GEC, making it easy for GEC researchers and developers to use the evaluation corpus. Finally, we evaluated representative GEC models on the created evaluation corpus and reported baseline scores for future Japanese GEC.