Target-dependent UNITERに基づく対象物体に関する参照表現を含む物体操作指示理解

石川 慎太朗; 杉浦 孔明

doi:10.11517/pjsai.JSAI2021.0_4I2GS7c04

Abstract

Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by a variety of ambiguities and missing information. Existing methods are insufficient to model reference expressions that specify relationships between objects. In this paper, we propose Target-dependent UNITER, which learns directly the relationship between the target object and other objects by focusing on the relevant regions within an image, instead of the whole image. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!