2025 Volume 37 Issue 1 Pages 571-576
This paper investigates effective action segmentation to identify the surgeon’s technique from surgical videos for use in the automatic evaluation of surgical skill in laparoscopic surgery. In particular, we focus on the classification of gesture-level actions. We adopt the MS-TCN and its enhanced version, MS-TCN++, as action classification models and explore the optimal number of prediction layers and refinement layers for both models. Evaluation experiments after optimization show that both models have nearly equivalent classification accuracies but exhibit different prediction tendencies. Based on this observation, we propose ensemble methods that integrates the classification results of both models. We demonstrate that by using a method based on the sum of the prediction probabilities of both models, the accuracy improves.