2024 年 32 巻 p. 196-205
Hand gestures are communication signals that emphasize an important part of an utterance and express the concept of emphasized words. Iconic gestures are hand gestures that depict concrete actions, objects, or events mentioned in speech. In this study, assuming that gesture forms of iconic gestures are determined based on the image of a given object in the speaker's mind, we propose a method for selecting iconic gesture forms based on the image representation obtained from a set of pictures of an object. First, we asked annotators to select a gesture form that best expresses the meaning of a given word based on a typical image and concept in their minds. We also collected a set of pictures of each entity from the web and created an average image representation from them. We then created a Deep Neural Network (DNN) model that takes a set of pictures of objects as input and predicts the typical gesture form that originates from the human mind. In the model evaluation experiment, our two-step gesture form selection method successfully classified seven types of gesture forms with an accuracy of over 62%. Furthermore, we created character animations that performed selected gestures and conducted a preliminary perception study to examine how human users perceive animated iconic gestures.