Generating Easy-to-understand Referring Expressions for Target Identifications

Mikihiro TANAKA; Takayuki ITAMOCHI; Kenichi NARIOKA; Ikuro SATO; Yoshitaka USHIKU; Tatsuya HARADA

doi:10.11370/isj.59.591

抄録

For communication between humans and intelligent agents such as robots, it is an important issue for agents to tell humans what they see. In this article, we introduce the results of our research on the generation of sentences that not only refer to objects correctly but also let humans find them quickly. If the target is not salient, finding the target itself becomes difficult. Therefore, we designed the model to utilize the salient contexts around it (e.g. “beside a car”) to help humans to find the targets. Moreover, we optimized the generation of sentences that are easily understood by using the time required to locate the referred objects by humans and their accuracies. To evaluate our system, we created a new dataset using images from Grand Theft Auto V (GTA V). Experimental results showed that our system generated sentences that are easily comprehended by humans, especially for less salient targets.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

電子写真

電子写真学会誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）