Abstract
This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.