Today, almost all aspects of life are influenced by technology to simplify these aspects of life. Researching how humans and computers interact is very important to build a good model for better human-computer interaction in the future. Using technology such as computer vision, we can now collect the information we are looking for directly from a human. Humans can use many kinds of modalities to interact with computers. Hands are perhaps the largest source of body language information after the face. To understand the gesture’s meaning, we can use MediaPipe Hands, developed at Google LLC, as a method to track and recognize human hands. However, if we want to understand some kinds of hand gestures using MediaPipe Hands, we need to create a condition using if-else manners. This research tried to collect the wide variety of each hand gesture using the 21 key points in x, y, and z coordinates as a feature. We chose Support Vector Machine (SVM) and Artificial Neural Network (ANN) as classifiers to validate the impact of large sample datasets. This research found that ANN is the best among all the methods we used as a classifier method for the 3D value of 21 key points from the hand skeleton. The accuracy and F1-score from ANN were 98.4% accuracy and 98.2% F1-Score, respectively, representing the best performance for each class of all the methods we used in this research.
抄録全体を表示