Abstract
We propose a video annotation system called “AnnoTone”, which supports video-editing process such as cropping and effects generation, by embedding annotations describing contextual information of a scene, such as geo-location of the video camera and quality of performance of actors, during a recording. The system converts inputted annotation data into high-frequency audio signals, which are almost inaudible to the human ear, and transmits them from a smartphone speaker placed near a video camera. After recording, embedded annotations are extracted from video files and exploited to support video-editing. The signals are not completely inaudible to the human ear, but we confirmed that they can be removed from video files without considerable quality loss, using audio filters. We also tested the reliability of signal embedding and the durability of annotation signals against audio conversions by experiments, and showed the feasibility of the proposed technique in practical situations. We present several example applications using AnnoTone, and discuss the possibility of novel video-editing techniques realized by annotation embedding.