Learning Spatiotemporal Gaps between Where We Look and What We Focus on

Ryo Yonetani; Hiroaki Kawashima; Takashi Matsuyama

doi:10.2197/ipsjtcva.5.75

Abstract

When we are watching videos, there are spatiotemporal gaps between where we look (points of gaze) and what we focus on (points of attentional focus), which result from temporally delayed responses or anticipation in eye movements. We focus on the underlying structure of those gaps and propose a novel learning-based model to predict where humans look in videos. The proposed model selects a relevant point of focus in the spatiotemporal neighborhood around a point of gaze, and jointly learns its salience and spatiotemporal gap with the point of gaze. It tells us “this point is likely to be looked at because there is a point of focus around the point with a reasonable spatiotemporal gap.” Experimental results with a public dataset demonstrate the effectiveness of the model to predict the points of gaze by learning a particular structure of gaps with respect to the types of eye movements and those of salient motions in videos.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!