IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions
Yue XIERuiyu LIANGZhenlin LIANGXiaoyan ZHAOWenhao ZENG
著者情報
ジャーナル フリー

2023 年 E106.D 巻 5 号 p. 1098-1101

詳細
抄録

To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

著者関連情報
© 2023 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top