対話中の振る舞い予測のための時間的整合性に注目した自己教師あり学習

岡留 有哉; 阿多 健史郎; 石黒 浩; 中村 泰

doi:10.1527/tjsai.37-6_B-M43

Abstract

Developing a communication agent that can mutually interact with a human has been expected. To realize the agent, real-time situation recognition and motion generation are necessary. The human-human interaction data is utilized to develop the recognition and the generation model. However, a cost of giving a certain label to the data is expensive, i.e., the number of labeled data becomes small. To cope with the small dataset problem, one of the approaches is to obtain the pre-trained weight by self-supervised learning. In this research, we propose estimating the amount of time-shift by “lag operation” as a task for self-supervised learning. The observed data is not isolated during the interaction between two people, and using both observed information from two people makes an estimation model reduce the uncertainty of situation detection. By exploiting these properties of interaction data, the time index of data of one person is shifted, i.e., the entrainment of two data is broken. This operation is called a “lag operation”, and estimating the amount of time-shift is defined as the pre-training task. We apply this pre-training to the prediction experiment that estimates near-future laughing during a conversation. The result shows the accuracy of the laughing prediction is improved by 1.3 points, and the lag operation is an effect for predicting the change of interaction situation.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!