IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

A Multitask Learning Approach Based on Cascaded Attention Network And Self-Adaption Loss for Speech Emotion Recognition
Yang LIUYuqi XIAHaoqin SUNXiaolei MENGJianxiong BAIWenbo GUANZhen ZHAOYongwei LI
著者情報
ジャーナル フリー 早期公開

論文ID: 2022EAP1091

この記事には本公開記事があります。
詳細
抄録

Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. Frist, non-personalized features are extracted to represent the process of emotion change while reducing external variables' influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

著者関連情報
© 2022 The Institute of Electronics, Information and Communication Engineers
feedback
Top