A Statistical Study of Late Fusion Strategy in Two-stream ConvNets for Human Action Recognition

Jianfeng XU; Kazuyuki TASAKA; Hiromasa YANAGIHARA

doi:10.11485/itewac.2017.0_13B-3

Abstract

Serving as the base of many advanced networks for human action recognition in videos, two-stream ConvNets have shown strong performance on a commonly used dataset, UCF101. This paper statistically analyzes the late fusion strategy in two-stream ConvNets from a frame level to a video level. We report the characteristics of the temporal domain on a frame level, which is called as a domino-like effect and explains well why an effective temporal fusion is difficult to design. For the fusion of two streams, it has been reported that a proper weight can improve performance substantially. However, here we will provide a different reason from the original report and propose a method for calculating an effective weight.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!