Abstract
From the psychological point of view, it is said that people feel comfortable if the rhythms of sound and video are synchronized. Therefore, works for the synchronization are performed frequently in the process of content making. We are aimed at developing a computer assisted system for the synchronization, which was done manually. In our system, we detect rapid variations in the sound and the video as accents, and adjust the playback speed of the video to make the accents of the video and the sound matched. To detect the accents, we first calculate the time-varying local variances of the variation in the signals, and determine the local thresholds for the detection. Next, we extract signals which exceed the thresholds as accents and assign the exceeded amount as the weight of each accent. In the synchronization process, we first take into account both the weights and the time difference between the occurrences of the accents of the sound and the video, and search a best matched accent of the video for each accent of the sound. Then, we adjust the playback speed of the video to ensure that the difference between the occurrences of the pair of the accents is small enough that people would not feel uncomfortable.