2022 年 28 巻 68 号 p. 500-503
This paper reports a vision-based safety system for construction sites. The proposed system employs person classification and action classification models. Both models use two-dimensional skeletons detected from a monocular camera image installed at the site. The former identifies a face image clipped near the detected eyes. The latter predicts their actions from a sequence of the skeletons. The workers positions are also calculated using the detected feet. The monocular camera works efficiently as a multi-sensor in the system. The training data generations are also labor-saved by utilizing three-dimensional skeletons captured using RGB-D cameras are employed for training the action classification.