Abstract
Generating natural motion in robots is important for improving human-robot interaction. We aim the development of a tele-operation system where the lip motion of a remote humanoid robot is automatically controlled from the operator’s voice. In the present work, we proposed a lip motion generation method where lip height and width degrees are estimated from formant information extracted from the speech signal. The method requires the calibration of only one parameter for speaker normalization, so that there is no need of prior construction of user-directed acoustic models. Lip height control is evaluated in two types of humanoid robots (Geminoid-F and Telenoid-R2). Subjective evaluation indicated that the proposed audio-based method is able to generate lip motion with naturalness superior to vision-based and motion capture-based approaches. Partial lip width control was shown to improve lip motion naturalness in Geminoid-F, which also has an actuator for stretching the lip corners. Issues regarding synchronization of audio and motion streams, and online real-time processing are also discussed.