2020 Volume 25 Issue 2 Pages 39-47
Computer vision is a field of study that covers a huge area starting from object detection, tracking, and recognition to human-computer interaction. The demand for computer vision is increasing day by day with applications to surveillance, video retrieval, and man-machine interaction. Human activity recognition has become an important topic in this field of research since it is the key to automatic video surveillance. In recent years, a large number of datasets, related to both sensors and videos, are created for human activity analysis. Most of these datasets are for single person activity recognition. In this research, we undertook even more challenging two-person interaction video datasets. We worked on two benchmark datasets of such research, the University of Texas at Austin (UTA) interaction dataset and the dataset of Stony Brook University. We have proposed a gradient-based technique for interaction recognition. The gradients at points within the region of interest of video frames are taken as features. In this process, we introduced a technique to find the region of interest based on moving object tracking. Variations in motion performance, inter-personal difference, and recording settings make the task extremely challenging. The proposed method yields a recognition rate of 68.33% for University of Texas Austin two-person interaction dataset and proves to be more efficient for indoor videos of the dataset of Stony Brook University with an accuracy of 74.40%.