小規模誤りデータからの日本語学習者作文の助詞誤り訂正

今村 賢治; 齋藤 邦子; 貞光 九月; 西川 仁

doi:10.5715/jnlp.19.381

Abstract

This paper presents grammatical error correction of Japanese particles written by foreign Japanese learners. Our method is based on discriminative sequence conversion, which corrects particle errors by substitution, insertion, or deletion. For this kind of error correction task, it is difficult to collect large learners’ corpora. We attempt to solve this problem based on a discriminative learning framework which uses the following two methods. First, language model probabilities obtained from large Japanese corpora are combined with n-gram binary features obtained from the learners’ corpora. This method is applied in order to measure the correctness of Japanese sentences. Second, automatically generated pseudo-error sentences are added to the learners’ corpora in order to enrich the corpora directly. Furthermore, we apply domain adaptation, in which the pseudo-error sentences (the source domain) are adapted to the real-error sentences (the target domain). Experimental results show that the recall rate has been improved by using both the language model probabilities and the n-gram binary features. Stable improvement has been achieved by using pseudo-error sentences with the domain adaptation.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!