Multimodal Attention Branch Networkに基づく把持命令文の生成

小椋 忠志; MAGASSOUBA Aly; 杉浦 孔明; 平川 翼; 山下 隆義; 藤吉 弘亘; 河井 恒

doi:10.11517/pjsai.JSAI2020.0_1Q3GS1105

Abstract

Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly. Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., ``Bring me a green tea bottle on the table.'' This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings. In this paper, we propose a method that generates sentences from visual inputs. Unlike other approaches, the proposed method has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings. In the experiment, we compared the proposed method with a baseline method using four standard metrics in image captioning. Experimental results show that the proposed method outperformed the baseline in terms of these metrics.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!