Abstract
Relating audio-visual events is important for constructing an artificial intelligent system, which can acquire the audio-visual knowledge of movement through active observation without teaching. This paper proposes a method for relating multiple audio-visual events observed in the real environment. As corresponding cues, we use Gestalt's grouping law; simultaneity of the occurrence of the sound and the change in movement or the same motion starting, similarity of repetition between sound and movement. Based on their correlation coefficient, the component of frequency at sound onset is related to the short-term space-time invariants (STSTI) of movement. The effectiveness of proposed method is showed by using a pilot experiment in the real environment.