2003 Volume 123 Issue 12 Pages 2094-2102
Relating audio-visual events is important for constructing an artificial intelligent system, which can acquire the audio-visual knowledge of moving objects through active observation without a supervisor. This paper proposes a method for relating multiple audio-visual events observed by a camera and a microphone according to general laws without object-specific knowledge, which copes with including entire object movement and sound location change. As corresponding cues, we use Gestalt’s grouping law; simultaneity of sound onsets and changes in movement, similarity of repetition between sound and movement. Based on the correlation coefficient between auditory and visual sequences, the component of frequency at sound onset is related to the spatiotemporal invariant sequence (STI sequence) of movement. We experimented in the real environment and obtained satisfactory results showing the effectiveness of the proposed method.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan