Host: 2017 ITE Winter Annual Convention
Name : 2017 ITE Winter Annual Convention
Date : December 12, 2017 - December 13, 2017
Serving as the base of many advanced networks for human action recognition in videos, two-stream ConvNets have shown strong performance on a commonly used dataset, UCF101. This paper statistically analyzes the late fusion strategy in two-stream ConvNets from a frame level to a video level. We report the characteristics of the temporal domain on a frame level, which is called as a domino-like effect and explains well why an effective temporal fusion is difficult to design. For the fusion of two streams, it has been reported that a proper weight can improve performance substantially. However, here we will provide a different reason from the original report and propose a method for calculating an effective weight.