Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
35th (2021)
Session ID : 4J1-GS-6d-05
Conference information

Generating Object Manipulation Instructions Including Referring Expressions of Target Objects and Destinations Based on Case Relation Transformer
*Motonari KAMBARAKomei SUGIURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The purpose of this paper is to extend the dataset based on a cross-modal generative language generation model. We propose a Case Relation Transformer (CRT) that generates a fetching instruction sentence from an image, such as ``Move the blue flip-flop to the lower left box.'' Unlike existing methods, CRT uses Transformer to capture the visual and geometric features of objects in an image. The Case Relation Block allows the CRT to process the object. We conducted comparative experiments and human evaluations. Experimental results showed that CRT outperformed the baseline methods.

Content from these authors
© 2021 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top