精密工学会誌
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
画像技術の実利用特集論文
Spatial-Temporal Graph Convolution-Transformerに基づく手話認識
高山 夏樹Gibran BENITEZ-GARCIA高橋 裕樹
著者情報
ジャーナル フリー

2021 年 87 巻 12 号 p. 1028-1035

詳細
抄録

This paper reports on sign language recognition based on human body part tracking. Tracking-based sign language recognition has practical advantages, such as robustness against variations in clothes and scene backgrounds. However, there is still room for improving feature extraction in tracking-based sign language recognition. In this paper, a tracking-based continuous sign language word recognition method called Spatial-Temporal Graph Convolution-Transformer is presented. Spatial-temporal graph convolution is employed to improve framewise feature extraction using tracking points, while Transformer enables the model to recognize word sequences of arbitrary lengths. Besides the model design, the training strategy also has an impact on the recognition performance. Multi-task learning, which combines connectionist temporal classification and cross-entropy losses, is employed to train the proposed method in this study. This training strategy improved the recognition performance by a significant margin. The proposed method was evaluated statistically using a sign language video dataset consisting of 275 types of isolated words and 120 types of sentences. The evaluation results show that STGC-Transformer with multi-task learning achieved 12.14% and 2.07% word error rates for isolated words and sentences, respectively.

著者関連情報
© 2021 公益社団法人 精密工学会
前の記事
feedback
Top