CNN-LSTMを用いた手話認識システムの開発

土井 ゆりか; 八木 拓真; 水口 智仁

doi:10.11517/jsaisigtwo.2015.AGI-001_06

Abstract

We propose a Japanese sign language recognition system combining Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). Existing research has had two problems. First, it has assumed that sign language could be recognized by extracting hand/arm positions and directions as features although non-manual signals play an important role in sign language. Second, it has divided temporal structure by using velocity of the hands or the movement section of the hands. However, this assumption might have left out the complex temporal structure of sign language. In this research, we created a dataset of movies of the upper bodies of sign language signers by using Kinect version2. In order to extract the effective features that include non-manual signals, we put the visible images and depth images of the dataset into CNN by frames. Then the extracted features were put into LSTM frame by frame to capture the complex temporal structure of sign language. We trained our whole network by using the backpropagation algorithm. Comparing this CNN-LSTM model to control models, we suggest that this model is more effective for sign language recognition.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!