ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Special Section on Sports Information Processing Technology and Its Application
[Papers] Multimodal Important Scene Detection in Far-view Soccer Videos Based on Single Deep Neural Architecture
Tomoki HaruyamaSho TakahashiTakahiro OgawaMiki Haseyama
ジャーナル フリー

2020 年 8 巻 2 号 p. 89-99


The details of the matches of soccer can be estimated from visual and audio sequences, and they correspond to the occurrence of important scenes. Therefore, the use of these sequences is suitable for important scene detection. In this paper, a new multimodal method for important scene detection from visual and audio sequences in far-view soccer videos based on a single deep neural architecture is presented. A unique point of our method is that multiple classifiers can be realized by a single deep neural architecture that includes a Convolutional Neural Network-based feature extractor and a Support Vector Machine-based classifier. This approach provides a solution to the problem of not being able to simultaneously optimize different multiple deep neural architectures from a small amount of training data. Then we monitor confidence measures output from this architecture for the multimodal data and enable their integration to obtain the final classification result.

© 2020 The Institute of Image Information and Television Engineers
前の記事 次の記事