Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Selected Papers for Special Issue on Industrial Application of Image Processing
Sign Language Recognition Based on Spatial-Temporal Graph Convolution-Transformer
Natsuki TAKAYAMAGibran BENITEZ-GARCIAHiroki TAKAHASHI
Author information
JOURNAL FREE ACCESS

2021 Volume 87 Issue 12 Pages 1028-1035

Details
Abstract

This paper reports on sign language recognition based on human body part tracking. Tracking-based sign language recognition has practical advantages, such as robustness against variations in clothes and scene backgrounds. However, there is still room for improving feature extraction in tracking-based sign language recognition. In this paper, a tracking-based continuous sign language word recognition method called Spatial-Temporal Graph Convolution-Transformer is presented. Spatial-temporal graph convolution is employed to improve framewise feature extraction using tracking points, while Transformer enables the model to recognize word sequences of arbitrary lengths. Besides the model design, the training strategy also has an impact on the recognition performance. Multi-task learning, which combines connectionist temporal classification and cross-entropy losses, is employed to train the proposed method in this study. This training strategy improved the recognition performance by a significant margin. The proposed method was evaluated statistically using a sign language video dataset consisting of 275 types of isolated words and 120 types of sentences. The evaluation results show that STGC-Transformer with multi-task learning achieved 12.14% and 2.07% word error rates for isolated words and sentences, respectively.

Content from these authors
© 2021 The Japan Society for Precision Engineering
Previous article
feedback
Top