Abstract
It is convenient for users to teach novel objects to a domestic service robot with a simple procedure. In this paper, we propose a method for learning the images and names of these objects shown by the users. The object images are segmented out from cluttered scenes by using motion attention. Phoneme recognition and voice conversion are used for the speech recognition and synthesis of the object names that are out of vocabulary. In the experiments conducted with 120 everyday objects, we have obtained an accuracy of 91% for object recognition and an accuracy of 82% for word recognition. Furthermore, we have implemented the proposed method on a physical robot, DiGORO, and evaluated its performance by using RoboCup@Home's “Supermarket” task. The results have shown that DiGORO has outperformed the highest score obtained in the RoboCup@Home 2009 competition.