The Journal of the Institute of Image Electronics Engineers of Japan
Online ISSN : 1348-0316
Print ISSN : 0285-9831
ISSN-L : 0285-9831
Contributed Paper
Skeletal-Coordinate-Based Sign Language Recognition by Integration of Multiple Transformer Encoders
Jiin TAKEDAYoungha CHANGNobuhiko MUKAI
Author information
JOURNAL RESTRICTED ACCESS

2024 Volume 53 Issue 3 Pages 166-172

Details
Abstract

Various studies on sign language recognition are conducted around the world. In particular, RGB image based methods are often used. This approach, however, includes the potential of its degrading accuracy, because it also learns the features included in the background. In addition, methods that use whole image as input cannot represent local features such as hand and arm movements. Therefore, in this study, we aim to improve the accuracy of sign language recognition by using a skeleton-based deep learning model with integration of multiple transformer encoders that utilizes the skeletal coordinate change and represents both global and local features. The skeletal coordinates obtained by Mediapipe are divided into four parts and four trained models are created individually. As the result of the experiments with the American sign language dataset WLASL (Word-Level American Sign Language) as the training data, the recognition accuracy of the proposed method improved more than that of color based methods, and we have confirmed the effectiveness.

Content from these authors
© 2024 The Institute of Image Electronics Engineers of Japan
Previous article Next article
feedback
Top