Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Analysis of Data Augmentation for Grammatical Error Correction Based on Various Rules
Shota KoyamaHiroya TakamuraNaoaki Okazaki
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 2 Pages 542-586

Details
Abstract

Inadequate training data renders neural grammatical error correction less effective. Recently, researchers have proposed data augmentation methods to address this problem. The methods are proposed based on the following three assumptions: (1) error diversity in generated data contributes to performance improvement; (2) error generation for a certain error type affects the correction performance of same-type errors; (3) a larger corpus used in error generation results in better performances. In this study, we design multiple error generation rules for various grammatical categories and propose a method to combine those error generation rules to validate the abovementioned assumptions by varying the error types in the generated data. Results show that assumptions (1) and (2) are valid, whereas assumption (3) is associated with the number of training steps and the number of generated errors. Furthermore, our proposed method can train a high-performance model even in unsupervised settings and more effectively correct writing errors as compared with the model based on round-trip translations. Finally, it is found that the error types corrected by the models based on round-trip and back translations differ from those corrected by our method.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top