2016 年 31 巻 6 号 p. AI30-E_1-12
This paper focuses on developing a model for estimating communication skills of each participant in a group from multimodal (verbal and nonverbal) features. For this purpose, we use a multimodal group meeting corpus including audio signal data and head motion sensor data of participants observed in 30 group meeting sessions. The corpus also includes the communication skills of each participant, which is assessed by 21 external observers with the experience of human resource management. We extracted various kinds of features such as spoken utterances, acoustic features, speaking turns and the amount of head motion to estimate the communication skills. First, we created a regression model to infer the level of communication skills from these features using support vector regression to evaluate the estimation accuracy of the communication skills. Second, we created a binary (high or low) classification model using support vector machine. Experiment results show that the multimodal model achieved 0.62 in R2 as the regression accuracy of overall skill, and also achieved 0.93 as the classification accuracy. This paper reports effective features in predicting the level of communication skill and shows that these features are also useful in characterizing the difference between the participants who have high level communication skills and those who do not.