Abstract
Many methods of sports video analysis have been proposed in the computer vision field. However, the analysis of swimming videos is a challenging task. This is because there is a lot of noise, such as water splashes, making it difficult to see the swimmer's motion and detect body parts. Thus, it is difficult to automatically estimate a swimmer's motion, especially the stroke. In this paper, we introduce a novel approach to automatically estimating the stroke in such situations. Firstly, we detect the swimmer from a swimming video using a projective transformation, background subtraction, and a Kalman filter. We next create a model that learns a mapping from a window of frames to a point on a one-dimensioned (1D) target signal, which represents a swimmer's stroke (we call a 'stroke signal'). We use a convolutional neural network (CNN) and multi long short-term memory (Multi-LSTM) which is an expanded model of LSTM. Finally, we estimate swimmer's stroke from the stroke signal. In a dataset including various environments, the outputs of our system showed higher accuracy than previous ones.