Noise Robust Automatic Speech Recognition Method for the Robot with Motor Noise using Missing Feature Theory

Yoshitaka Nishimura; Mitsuru Ishizuka; Kazuhiro Nakadai; Mikio Nakano; Hiroshi Tsujino

doi:10.7210/jrsj.25.1189

抄録

Automatic speech recognition (ASR) is essential for human-humanoid communication. One of the main problems with ASR by a humanoid is that it is inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 [dB] ) . However, it is possible to estimate these noises by using information on the humanoid's motions and gestures. This paper proposes a method to improve ASR for a humanoid with motor noises by utilizing its motion/gesture information. The method consists of noise suppression and missing-feature-theory-based ASR (MFT-ASR) . The proposed noise suppression technique is based on spectral subtraction, and a white noise is added to blur distortion of suppression. MFT-ASR improves ASR by masking unreliable acoustic features in the input sound. The motion/gesture information is used for obtaining the unreliable acoustic features. Furthermore, we also evaluated with the acoustic model adaptation technique called MLLR (Maximum Likelihood Linear Regression) . Un-supervised MLLR was used for the adaptation. We evaluated the proposed method through recognition of speech recorded by using Honda ASIMO in a room with reverberation. The noise data contained 34 kinds of noises: motor noises without motions, gesture noises, walking noises, and other kind of noises. The experimental results show that the proposed method outperforms the conventional multi-condition training technique.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）