調理動作後の物体の視覚的状態予測を目指した Visual Recipe Flow データセットの構築と評価

白井 圭佑; 橋本 敦史; 西村 太一; 亀甲 博貴; 栗田 修平; 森 信介

doi:10.5715/jnlp.30.1042

General Paper (Peer-Reviewed)

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Shinsuke Mori

Author information

Keywords: Annotation, Multimodal, Graph Structure

JOURNAL FREE ACCESS

2023 Volume 30 Issue 3 Pages 1042-1060

DOI https://doi.org/10.5715/jnlp.30.1042

Details

Abstract

We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). We explain the data collection and annotation procedure and evaluate the dataset by measuring the inter-annotator agreement. Finally, we investigate the importance of each annotation component by conducting multi-modal information retrieval experiments.

Corresponding author

Register with J-STAGE for free!