Journal of Robotics and Mechatronics
Online ISSN : 1883-8049
Print ISSN : 0915-3942
ISSN-L : 0915-3942
Regular Papers
Simultaneous Execution of Dereverberation, Denoising, and Speaker Separation Using a Neural Beamformer for Adapting Robots to Real Environments
Daichi NaganoKazuo Nakazawa
著者情報
ジャーナル オープンアクセス

2022 年 34 巻 6 号 p. 1399-1410

詳細
抄録

It remains challenging for robots to accurately perform sound source localization and speech recognition in a real environment with reverberation, noise, and the voices of multiple speakers. Accordingly, we propose “U-TasNet-Beam,” a speech extraction method for extracting only the target speaker’s voice from all ambient sounds in a real environment. U-TasNet-Beam is a neural beamformer comprising three elements: a neural network for removing reverberation and noise, a second neural network for separating the voices of multiple speakers, and a minimum variance distortionless response (MVDR) beamformer. Experiments with simulated data and recorded data show that the proposed U-TasNet-Beam can improve the accuracy of sound source localization and speech recognition in robots compared to the conventional methods in a noisy, reverberant, and multi-speaker environment. In addition, we propose the spatial correlation matrix loss (SCM loss) as a loss function for the neural network learning the spatial information of the sound. By using the SCM loss, we can improve the speech extraction performance of the neural beamformer.

著者関連情報

この記事は最新の被引用情報を取得できません。

© 2022 Fuji Technology Press Ltd.

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JRM official website.
https://www.fujipress.jp/jrobomech/rb-about/#https://creativecommons.org/licenses/by-nd
前の記事 次の記事
feedback
Top