エッジデバイス搭載可能なAttention Moduleを用いた動的手話認識システム

孟 悦捷; 柳澤 政生; 史 又華

doi:10.11517/pjsai.JSAI2023.0_4Xin178

Abstract

In recent years, people’s life is becoming more and more convenient due to voice assistants like Siri, adopting artificial intelligence (AI) techniques. However, hearing-impaired people, especially those who cannot speak, are unable to have the benefits of this technology for physical reasons. Gesture recognition techniques using deep learning would be a hopeful alternative to help them. However, many previous studies used 3D-CNN or CNN+LSTM to recognize gestures from images or from videos, which requires large memory. In order to solve this problem, this paper proposes a gesture recognition model based on Transformer called DGT-STA. This model is able to achieve accuracy beyond that of 3D-CNN with a shallower neural network, and reduced memory usage to 50.91% compared to models using other Attention modules. In addition, a dataset of Japanese Sign Language is built to train and evaluate DGT-STA. Finally, this paper verified that it is feasible to deploy DGT-STA on IoT edge devices.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!