日本語文法誤り訂正のための誤用タグ付き評価コーパスの構築

小山 碧海; 喜友名 朝視顕; 小林 賢治; 新井 美桜; 三田 雅人; 岡 照晃; 小町 守

doi:10.5715/jnlp.30.330

Abstract

This study constructed an error-tagged evaluation corpus for Japanese grammatical error correction (GEC). Evaluation corpora are essential for assessing the performance of models. The availability of various evaluation corpora for English GEC has facilitated a comprehensive comparison between models and the development of the English GEC community. However, the development of the Japanese GEC community has been hindered due to the lack of available evaluation corpora in the Japanese GEC. As a result, we constructed a new evaluation corpus for the Japanese GEC and made it available to the public. We used texts written by the Japanese language learners in the Lang-8 corpus, a representative learner corpus in GEC, to create the evaluation corpus. The specification of the evaluation corpus was modified to align with the representative corpora and tools in the English GEC, making it easy for GEC researchers and developers to use the evaluation corpus. Finally, we evaluated representative GEC models on the created evaluation corpus and reported baseline scores for future Japanese GEC.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!