ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Regular Section
[Paper] Semantic Concept Detection based on Spatial Pyramid Matching and Semi-supervised Learning
Yoshihiko KawaiMahito Fujii
Author information

2013 Volume 1 Issue 2 Pages 190-198


Analyzing video for semantic content is very important for finding the desired video among a huge amount of accumulated video data. One conventional method for detecting objects depicted in video is called the bag-of-visual-words method, and is based on local feature occurrence frequencies. We propose a method that improves on the detection accuracy of traditional method by dividing video frames into overlapped sub-regions of various sizes. The method computes local and global features for each of these sub-regions to reflect spatial positioning in the feature vectors. These changes ensure that the method is resistant to variations in the size and position of objects appearing in the video. We also propose a training framework based on semi-supervised learning that uses a small number of labeled data points as a starting point and generates additional labeled training data efficiently, with few errors. Experiments using a video data set confirmed improved detection accuracy over earlier methods.

Related papers from these authors
© 2013 The Institute of Image Information and Television Engineers
Previous article Next article