Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
In this study, we achieved bidirectional translation between description and action using small paired data. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives. The robots need to associate real-world objects with linguistic expressions, and machine learning approaches require large-scale paired data. However, a paired dataset is costly to construct and difficult to collect. We propose a two-stage training method for the bidirectional translation that does not require complete paired data. In the proposed method, we pre-trained autoencoders for description and action with a large amount of non-paired data. Then, we fine-tuned the entire model to combine their intermediate representations using the small paired data. We experimentally evaluated our method using a paired dataset consisting of motion-captured actions and descriptions. The results showed that our method performed well even when the number of paired data to train was small.