One of the leading causes of vehicle accidents is the irregular behavior of drivers. Therefore, it is crucial to detect such abnormal behaviors and develop a warning system to avoid possible dangerous accidents. In this paper, we present a novel approach to recognizing driver behavior by learning the temporal geometrical features from the relationship between human pose estimation keypoints and the surrounding objects. Existing approaches make use of either human pose skeleton data alone or additional data like sensory data, depth map images, or optical flow images to detect human activity, or separate object detection models and combine the information, which is both complex and time-consuming. In our two-stage approach, the first stage detects the human pose and performs object detection, making use of only a single anchorless model to detect the essential spatial data required to predict human activity from a 2D image. The second stage then takes this output and performs feature engineering of the positions, distances, and angles between various joints and objects based on the correlation factor using a long-short-term memory (LSTM) model that can learn temporal relations between the time series data. The model predicts not only the action classes but also the future data, which is crucial in driver monitor systems to continuously detect the chances of any critical actions being performed to avoid accidents.
View full abstract