往復翻訳を教師とした言い換え生成モデルによる高速テキストデータ拡張

田中 慎太郎; 飯間 等

doi:10.11517/pjsai.JSAI2023.0_2E5GS604

Abstract

In machine learning, large amounts of data are needed to improve model performance.However, collecting them is costly, so a technique called data augmentation is used to generate new data from existing data.In natural language processing, there is a text data augmentation technique called round-trip translation,which translates text data into another language and then translates it back into the original language to generate a paraphrase of the original text.However, the round-trip translation is computationally expensive and time-consuming because it requires twice translations for one text. In this paper, we propose a faster text augmentation method using a model trained to make the round-trip translation.The dataset of this training consists of original texts and the results of their round-trip translation.Experimental results show that the proposed method, using the Text-To-Text Transfer Transformer (T5),can augment data at most about 1.6 times faster than round-trip translation.Furthermore, T5 can generate paraphrases not included in the training data based on the knowledge acquired through pretraining.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!