IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Improve Multichannel Speech Recognition with Temporal and Spatial Information
Yu ZHANGPengyuan ZHANGQingwei ZHAO
著者情報
ジャーナル フリー

2018 年 E101.D 巻 7 号 p. 1963-1967

詳細
抄録

In this letter, we explored the usage of spatio-temporal information in one unified framework to improve the performance of multichannel speech recognition. Generalized cross correlation (GCC) is served as spatial feature compensation, and an attention mechanism across time is embedded within long short-term memory (LSTM) neural networks. Experiments on the AMI meeting corpus show that the proposed method provides a 8.2% relative improvement in word error rate (WER) over the model trained directly on the concatenation of multiple microphone outputs.

著者関連情報
© 2018 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top