Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
In machine learning, large amounts of data are needed to improve model performance.However, collecting them is costly, so a technique called data augmentation is used to generate new data from existing data.In natural language processing, there is a text data augmentation technique called round-trip translation,which translates text data into another language and then translates it back into the original language to generate a paraphrase of the original text.However, the round-trip translation is computationally expensive and time-consuming because it requires twice translations for one text. In this paper, we propose a faster text augmentation method using a model trained to make the round-trip translation.The dataset of this training consists of original texts and the results of their round-trip translation.Experimental results show that the proposed method, using the Text-To-Text Transfer Transformer (T5),can augment data at most about 1.6 times faster than round-trip translation.Furthermore, T5 can generate paraphrases not included in the training data based on the knowledge acquired through pretraining.