抄録
This paper presents a novel computational approach for recognizing object manipulation behaviors by a humanoid robot. A time-delay deep autoencoder is applied for acquiring multimodal feature vectors from multiple behavior patterns represented with visuomotor temporal sequences. Thanks to the high generalization capability of the deep autoencoder, our proposed mechanism successfully generates reliable feature vectors even when motion pattern inputs are deteriorated with noise. Our experimental results demonstrate that the acquired multimodal feature vectors relieve degradation of behavior recognition performance by utilizing the visual information accompanied with the motion pattern inputs in a complementary manner. Moreover, even if the motion pattern inputs are only available, our proposed multimodal integration mechanism succeeds in generating reliable feature vectors by internally retrieving the accompanied visual information.