日本語の NLP タスクに対して有効な Data Augmentation 手法

高萩 恭介; 古宮 嘉那子; 新納 浩幸

doi:10.5715/jnlp.31.958

Abstract

Data augmentation is a technique used for augmenting training data to improve the model performance in supervised learning and has been widely used in the field of computer vision. However, the technique remains underdeveloped in natural language processing. In this study, we focus on two data augmentation methods that can be used for Japanese natural language processing tasks. The first method involves replacing a word in a sentence with another word using the masked language model of a different BERT from that used in the analysis and inference. The second method involves shuffling the order of phrases so that the dependency relations of the sentences are not broken. In this study, we provide an overview of each method and its corresponding conversion and then describe the tasks for which each method is effective.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!