Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 2E5-GS-6-04
Conference information

Accelerated text data augmentation using a paraphrase generation model with round-trip translation as a supervisor
*Shintaro TANAKAHitoshi IIMA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In machine learning, large amounts of data are needed to improve model performance.However, collecting them is costly, so a technique called data augmentation is used to generate new data from existing data.In natural language processing, there is a text data augmentation technique called round-trip translation,which translates text data into another language and then translates it back into the original language to generate a paraphrase of the original text.However, the round-trip translation is computationally expensive and time-consuming because it requires twice translations for one text. In this paper, we propose a faster text augmentation method using a model trained to make the round-trip translation.The dataset of this training consists of original texts and the results of their round-trip translation.Experimental results show that the proposed method, using the Text-To-Text Transfer Transformer (T5),can augment data at most about 1.6 times faster than round-trip translation.Furthermore, T5 can generate paraphrases not included in the training data based on the knowledge acquired through pretraining.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top