2024 Volume 31 Issue 3 Pages 958-983
Data augmentation is a technique used for augmenting training data to improve the model performance in supervised learning and has been widely used in the field of computer vision. However, the technique remains underdeveloped in natural language processing. In this study, we focus on two data augmentation methods that can be used for Japanese natural language processing tasks. The first method involves replacing a word in a sentence with another word using the masked language model of a different BERT from that used in the analysis and inference. The second method involves shuffling the order of phrases so that the dependency relations of the sentences are not broken. In this study, we provide an overview of each method and its corresponding conversion and then describe the tasks for which each method is effective.