Relating Audio-Visual Events Caused by Multiple Movements: In the Case of Entire Object Movement and Sound Location Change

Jinji Chen; Toshiharu Mukai; Yoshinori Takeuchi; Tetsuya Matsumoto; Hiroaki Kudo; Tsuyoshi Yamamura; Noboru Ohnishi

doi:10.1541/ieejeiss.123.2094

Abstract

Relating audio-visual events is important for constructing an artificial intelligent system, which can acquire the audio-visual knowledge of moving objects through active observation without a supervisor. This paper proposes a method for relating multiple audio-visual events observed by a camera and a microphone according to general laws without object-specific knowledge, which copes with including entire object movement and sound location change. As corresponding cues, we use Gestalt’s grouping law; simultaneity of sound onsets and changes in movement, similarity of repetition between sound and movement. Based on the correlation coefficient between auditory and visual sequences, the component of frequency at sound onset is related to the spatiotemporal invariant sequence (STI sequence) of movement. We experimented in the real environment and obtained satisfactory results showing the effectiveness of the proposed method.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!