Case Relation Transformerに基づく対象物体及び目標領域の参照表現を含む物体操作指示文生成

神原 元就; 杉浦 孔明

doi:10.11517/pjsai.JSAI2021.0_4J1GS6d05

Abstract

The purpose of this paper is to extend the dataset based on a cross-modal generative language generation model. We propose a Case Relation Transformer (CRT) that generates a fetching instruction sentence from an image, such as ``Move the blue flip-flop to the lower left box.'' Unlike existing methods, CRT uses Transformer to capture the visual and geometric features of objects in an image. The Case Relation Block allows the CRT to process the object. We conducted comparative experiments and human evaluations. Experimental results showed that CRT outperformed the baseline methods.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Conference information

Register with J-STAGE for free!