Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Effective Data Augmentation Methods for Japanese NLP Tasks
Kyosuke TakahagiKanako KomiyaHiroyuki Shinnou
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 3 Pages 958-983

Details
Abstract

Data augmentation is a technique used for augmenting training data to improve the model performance in supervised learning and has been widely used in the field of computer vision. However, the technique remains underdeveloped in natural language processing. In this study, we focus on two data augmentation methods that can be used for Japanese natural language processing tasks. The first method involves replacing a word in a sentence with another word using the masked language model of a different BERT from that used in the analysis and inference. The second method involves shuffling the order of phrases so that the dependency relations of the sentences are not broken. In this study, we provide an overview of each method and its corresponding conversion and then describe the tasks for which each method is effective.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top