Journal of the Robotics Society of Japan
Online ISSN : 1884-7145
Print ISSN : 0289-1824
ISSN-L : 0289-1824
Missing Feature Theory based Interface Between Sound Source Separation and Automatic Speech Recognition and Applying to Multiple Robots
Shunichi YamamotoKazuhiro NakadaiHiroshi TsujinoHiroshi G. Okuno
Author information
JOURNAL FREE ACCESS

2005 Volume 23 Issue 6 Pages 743-751

Details
Abstract
Robot audition is a critical technology in creating an intelligent robot operating in daily environments. To realize such a robot audition system, we have designed a missing feature theory based interface between sound source separation and automatic speech recognition (ASR) . In this interface, features distorted by speech separation are detected from input speech as missing features. The detected missing features are masked on recognition to avoid severe deterioration of recognition performance. By using the interface, we developed the robot audition system which recognizes multiple simultaneous speech. We also assess its general applicability by implementing it on three different humanoids, i.e., Honda ASIMO, SIG2, and Replie of Kyoto University. By using three simultaneous speeches as benchmarks, its general applicability was confirmed. When triphone is used and a size of vocabulary is 200 words, the average word correct of three simultaneous speech are 79.7%, 78.7%, and 82.7% for ASIMO, SIG2, and Replie, respectively.
Content from these authors
© The Robotics Society of Japan
Previous article Next article
feedback
Top